10 — Infrastructure as Code (Terraform / OpenTofu)

Focus: Terraform/OpenTofu. IaC provisions the substrate — clusters, networks, buckets, KMS, IAM — and bootstraps ArgoCD, then gets out of the way. Decision in ADR-0031.


1. OpenTofu, and the IaC↔GitOps boundary

Tool: OpenTofu (the MPL-licensed, community-governed fork of Terraform) — to avoid the BSL licensing risk of recent Terraform while keeping the entire HCL/module ecosystem. Drop-in for our needs.

The boundary (the key decision):

flowchart TB
    classDef i fill:#c7d2fe,stroke:#3730a3,color:#111827;
    classDef b fill:#fde68a,stroke:#b45309,color:#111827;
    classDef g fill:#bbf7d0,stroke:#15803d,color:#111827;
    subgraph IAC["OpenTofu (substrate) — outside the cluster"]
      net["VPC / subnets / NAT / security groups"]:::i
      k8s["Kubernetes cluster + node pools (incl. arm64/spot)"]:::i
      obj["Object storage buckets (versioned + object-lock)"]:::i
      kms["KMS keys"]:::i
      iam["IAM roles + OIDC (workload identity, GHA OIDC)"]:::i
      dns["DNS zones"]:::i
    end
    boot["bootstrap: install ArgoCD + point at GitOps root app"]:::b
    subgraph GITOPS["ArgoCD (everything inside the cluster)"]
      plat["platform addons (eso, cert-manager, ingress, velero, monitoring)"]:::g
      apps["BitVault apps + data operators"]:::g
    end
    IAC --> boot --> GITOPS

IaC provisions the substrate and hands off; GitOps owns in-cluster. We do not manage in-cluster app resources with OpenTofu — two reconcilers fighting over the same objects is an anti-pattern. OpenTofu’s last in-cluster act is installing ArgoCD and pointing it at the GitOps root app (06).


2. What IaC owns

Layer Resources
Network VPC, subnets (multi-AZ), NAT, security groups, private endpoints
Compute K8s cluster (EKS/GKE/AKS), node pools (on-demand + spot, amd64 + arm64), autoscaler
Storage object-storage buckets (ADR-0020) with versioning + object-lock for backups (12)
Crypto KMS keys (per env) for envelope encryption + etcd (ADR-0014)
Identity IAM roles, workload identity (IRSA/GKE WI/Azure WI), GitHub OIDC provider (07, 09)
DNS/edge zones, records, CDN, WAF
Bootstrap ArgoCD install + root Application

3. State, modules, workflow

flowchart LR
    classDef a fill:#fde68a,stroke:#b45309,color:#111827;
    classDef o fill:#bbf7d0,stroke:#15803d,color:#111827;
    pr["PR to infra repo"]:::a --> plan["tofu plan (CI, OIDC read role)"]:::a
    plan --> rev["review plan output in PR"]:::a
    rev --> appr{"approved + protected env?"}:::a
    appr -- yes --> apply["tofu apply (OIDC apply role)"]:::o
    appr -- no --> hold["hold"]:::a
    sched["scheduled tofu plan"]:::a -. drift alert .-> rev

4. Multi-cloud posture

Clusters and IAM are provider-specific (separate stacks per provider); shared modules abstract the common shapes. The storage multi-cloud abstraction is application level (ADR-0005), not OpenTofu — OpenTofu just provisions buckets/keys per provider. This keeps IaC simple and the portability promise where it belongs (the app).


5. Tradeoffs / Alternatives / Scaling

Tradeoffs. A hard IaC/GitOps boundary means two systems (OpenTofu + ArgoCD) and a bootstrap handoff — but it avoids the far worse problem of two controllers fighting and keeps each tool doing what it’s best at (substrate vs in-cluster reconciliation).

Alternatives considered.

Scaling concerns.

References