10 — Infrastructure as Code (Terraform / OpenTofu)
Focus: Terraform/OpenTofu. IaC provisions the substrate — clusters, networks, buckets, KMS, IAM — and bootstraps ArgoCD, then gets out of the way. Decision in ADR-0031.
1. OpenTofu, and the IaC↔GitOps boundary
Tool: OpenTofu (the MPL-licensed, community-governed fork of Terraform) — to avoid the BSL licensing risk of recent Terraform while keeping the entire HCL/module ecosystem. Drop-in for our needs.
The boundary (the key decision):
flowchart TB
classDef i fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef b fill:#fde68a,stroke:#b45309,color:#111827;
classDef g fill:#bbf7d0,stroke:#15803d,color:#111827;
subgraph IAC["OpenTofu (substrate) — outside the cluster"]
net["VPC / subnets / NAT / security groups"]:::i
k8s["Kubernetes cluster + node pools (incl. arm64/spot)"]:::i
obj["Object storage buckets (versioned + object-lock)"]:::i
kms["KMS keys"]:::i
iam["IAM roles + OIDC (workload identity, GHA OIDC)"]:::i
dns["DNS zones"]:::i
end
boot["bootstrap: install ArgoCD + point at GitOps root app"]:::b
subgraph GITOPS["ArgoCD (everything inside the cluster)"]
plat["platform addons (eso, cert-manager, ingress, velero, monitoring)"]:::g
apps["BitVault apps + data operators"]:::g
end
IAC --> boot --> GITOPS
IaC provisions the substrate and hands off; GitOps owns in-cluster. We do not manage in-cluster app resources with OpenTofu — two reconcilers fighting over the same objects is an anti-pattern. OpenTofu’s last in-cluster act is installing ArgoCD and pointing it at the GitOps root app (06).
2. What IaC owns
| Layer | Resources |
|---|---|
| Network | VPC, subnets (multi-AZ), NAT, security groups, private endpoints |
| Compute | K8s cluster (EKS/GKE/AKS), node pools (on-demand + spot, amd64 + arm64), autoscaler |
| Storage | object-storage buckets (ADR-0020) with versioning + object-lock for backups (12) |
| Crypto | KMS keys (per env) for envelope encryption + etcd (ADR-0014) |
| Identity | IAM roles, workload identity (IRSA/GKE WI/Azure WI), GitHub OIDC provider (07, 09) |
| DNS/edge | zones, records, CDN, WAF |
| Bootstrap | ArgoCD install + root Application |
3. State, modules, workflow
- Remote state, per environment, with locking and encryption at rest (e.g. S3 + DynamoDB lock, GCS, or a TF/Tofu cloud backend). No secrets in state where avoidable (mark sensitive; prefer ESO/refs over TF-managed secrets, 09).
- Modules (
network,cluster,bucket,kms,iam) composed by thin per-env stacks (envs/dev|staging|prod) withtfvarsfor differences. DRY substrate. - Workflow: PR →
tofu planin CI (read-only role via OIDC) → review the plan → gatedtofu apply(apply role via OIDC; or Atlantis / a tofu-controller). Plan in CI, apply gated — never blind applies. Scheduledplandetects drift.
flowchart LR
classDef a fill:#fde68a,stroke:#b45309,color:#111827;
classDef o fill:#bbf7d0,stroke:#15803d,color:#111827;
pr["PR to infra repo"]:::a --> plan["tofu plan (CI, OIDC read role)"]:::a
plan --> rev["review plan output in PR"]:::a
rev --> appr{"approved + protected env?"}:::a
appr -- yes --> apply["tofu apply (OIDC apply role)"]:::o
appr -- no --> hold["hold"]:::a
sched["scheduled tofu plan"]:::a -. drift alert .-> rev
4. Multi-cloud posture
Clusters and IAM are provider-specific (separate stacks per provider); shared modules abstract the common shapes. The storage multi-cloud abstraction is application level (ADR-0005), not OpenTofu — OpenTofu just provisions buckets/keys per provider. This keeps IaC simple and the portability promise where it belongs (the app).
5. Tradeoffs / Alternatives / Scaling
Tradeoffs. A hard IaC/GitOps boundary means two systems (OpenTofu + ArgoCD) and a bootstrap handoff — but it avoids the far worse problem of two controllers fighting and keeps each tool doing what it’s best at (substrate vs in-cluster reconciliation).
Alternatives considered.
- Terraform (BSL): functionally fine, but the license change is a real risk for a self-hostable OSS-friendly project; OpenTofu removes it (ADR-0031).
- Pulumi: real languages, nice for complex logic; HCL/OpenTofu’s declarative model + ecosystem fit the substrate better and lower the bar for contributors.
- Crossplane (GitOps-native infra via CRDs): compelling — manage cloud infra through ArgoCD with K8s-style reconciliation; a strong future option to unify the control loop. Deferred: OpenTofu is lower-risk and better-understood for v1; revisit to shrink the IaC/GitOps seam.
- Manage K8s resources in Terraform: the anti-pattern we explicitly reject (§1).
Scaling concerns.
- State bloat / blast radius → split state per env + per layer (network/cluster/data) so a change to one doesn’t risk all; remote state with locking.
- Module versioning → pin module versions; test modules in nonprod first.
- Many environments/regions → stacks parameterized by region/env; DR region is just another stack instantiation (11).
References
- OpenTofu: https://opentofu.org/ · Terraform license change context: https://opentofu.org/manifesto/
- Remote state & locking: https://opentofu.org/docs/language/state/remote/
- Crossplane (alternative): https://www.crossplane.io/