tenv v2.0: The Importance of Explicit Behavior for the Version Manager

cover
24 Jun 2024

The explicit behavior of the IAC version managers is quite crucial. It is especially critical in the realm of Terraform and OpenTofu because tool upgrades might destroy or corrupt all managed infrastructure. To protect users from unexpected updates, all version managers have to work clearly and without any internal wizardry that cannot be explained without a deep dive into the sources.

tenv is a versatile version manager for OpenTofu, Terraform, Terragrunt, and Atmos, written in Go and developed by the tofuutils team. This tool simplifies the complexity of handling different versions of these powerful tools, ensuring developers and DevOps professionals can focus on what matters most - building and deploying efficiently. tenv is a successor of tofuenv and tfenv.

In the process of tenv development, our team discovered quite an unpleasant surprise with Terragrunt and tenv, which may have created serious issues. On a fresh install of the Linux system, when one of our users attempted to run Terragrunt, the execution ended up utilizing OpenTofu instead of Terraform, with no warnings in advance. In the production environment, it might cause serious Terraform state corruption, but luckily, it was a testing environment. Before we look at the root cause of this issue, I need to explain how the tenv works.

Tenv manages all tools by wrapping them in an additional binary that serves as a proxy for the original tool. It means you can't install Terraform or OpenTofu on an ordinary Linux machine alongside tenv (except the NixOS case). At our tool, we supply a binary with the same name as the tool (Terraform/OpenTofu/Terragrunt/Atmos), within which we implement the proxy pattern. It was required since it simplifies version management and allows us to add new capabilities to automatic version discovery and installation handling.

So, knowing that tenv is based on a downstream proxy architecture, we are ready to return to the problem. Why was our user's execution performed using OpenTofu rather than Terraform? The answer has two parts:

  1. Terragrunt started to use OpenTofu as the default IAC tool, however, this was not a major release; instead, it was provided as a patch and users didn't expect to have any differences in the behavior. The original problem may be found at https://github.com/gruntwork-io/terragrunt/issues/3172.

  2. When Terragrunt called OpenTofu in the new default behavior, it used the tenv's proxy to check the required version of OpenTofu and install it automatically.

Although the TERRAGRUNT_TFPATH setting might control the behavior, users were unaware of the Terragrunt breaking change and were surprised to see OpenTotu at the end of execution. But why did OpenTofu execute if users did not have it in their system? Here, we are dealing with the second issue that has arisen. At the start of tenv development, we replicated many features from the tfenv tool.

One of these features was automatic tool installation, which is controlled by the TFENV_AUTO_INSTALL environment variable and is enabled by default. Tenv also has the TENV_AUTO_INSTALL variable, which is also true by default, unless the mentioned case hasn't been discovered.

Users who used Terraform/OpenTofu without Terragrunt via tenv may have encountered the auto-install when, for example, switching the version of the tool with the following command:

  • tenv tf use 1.5.3
  • tenv tofu use 1.6.1

The use command installed the required version even if it wasn’t present in the operation system locally.

After a brief GitHub discussion, our team decided to disable auto-install by default and release this minor change as a new, major version of tenv. We made no major changes to the program, did not update the framework of the language version, and only updated the default variable, deciding that users should understand that one of the most often utilized and crucial behaviors had changed.

It's interesting that during the discussion, we disagreed on whether users should read the README or documentation, but whether you like it or not, it's true that people don't read the docs unless they're in difficulty. As the tofuutils team, we cannot accept the possibility that a user will mistakenly utilize OpenTofu in a real-world production environment and break the state or the cloud environment.

Finally, I'd like to highlight a few points once more:

  • Implement intuitive behavior in your tool.
  • Consider user experience, and keep in mind¬†that many people don't read manuals.
  • Do not worry about releasing a major version if you made the breaking change.
  • In programming, explicit is preferable to implicit, especially when dealing with state-sensitive tools.