06 — GitOps & ArgoCD

Task 6: design GitOps workflows. Pull-based continuous delivery: ArgoCD reconciles each cluster toward the desired state in Git, detects drift, self-heals, and makes rollback a git revert. Decision in ADR-0028.


1. Why pull-based GitOps

Property Payoff
Git = source of truth every change is reviewed, audited, attributable, revertable
Pull, not push CI never holds cluster credentials; the agent runs in the cluster and pulls — far smaller attack surface (07)
Drift detection + self-heal manual kubectl edit is reverted; the cluster cannot silently diverge from Git
Declarative rebuild a lost cluster is re-created from IaC + Git — the DR superpower (11)

2. The reconcile loop

flowchart LR
    classDef g fill:#fbcfe8,stroke:#be185d,color:#111827;
    classDef a fill:#bbf7d0,stroke:#15803d,color:#111827;
    classDef c fill:#c7d2fe,stroke:#3730a3,color:#111827;
    git[("GitOps repo<br/>desired state")]:::g
    argo["ArgoCD controller (in-cluster)"]:::a
    live["Live cluster state"]:::c
    argo -->|"observe desired"| git
    argo -->|"observe live"| live
    argo -->|"diff → sync (apply)"| live
    live -->|"drift?"| argo
    argo -->|"self-heal / prune"| live

ArgoCD continuously diffs desired (Git) vs live (cluster); on difference it syncs (or, with self-heal, reverts unauthorized live changes). The image digest in Git (01) is what changes to trigger a deploy.


3. App-of-apps + ApplicationSets

flowchart TB
    classDef r fill:#fde68a,stroke:#b45309,color:#111827;
    classDef p fill:#c7d2fe,stroke:#3730a3,color:#111827;
    classDef s fill:#bbf7d0,stroke:#15803d,color:#111827;
    root["root Application (app-of-apps, bootstrapped by OpenTofu)"]:::r
    root --> proj["AppProjects (guardrails: repos/clusters/namespaces)"]:::p
    root --> plat["platform ApplicationSet<br/>(eso, cert-manager, ingress, velero, monitoring)"]:::p
    root --> appset["app ApplicationSet (generators)"]:::p
    appset --> g1["gateway @ dev"]:::s
    appset --> g2["gateway @ staging"]:::s
    appset --> g3["gateway @ prod"]:::s
    appset --> prgen["PR generator → pr-123 preview"]:::s

4. GitOps repo layout (directories, not branches)

bitvault-gitops/
├── bootstrap/            # root app-of-apps (what OpenTofu points ArgoCD at)
├── projects/             # AppProjects
├── applicationsets/      # generators (services, previews)
├── platform/             # eso, cert-manager, ingress, velero, monitoring (as Apps)
└── envs/
    ├── dev/<service>/     # values + pinned image digest
    ├── staging/<service>/
    └── prod/<service>/

Environments are directories in one branch, promoted by PR — not long-lived env branches (an anti-pattern, 03). ArgoCD resources live separately from application Kubernetes manifests (industry guidance).


5. Sync ordering: waves + progressive syncs


6. Sync policy & guardrails

Setting Prod Nonprod
Auto-sync manual/gated (or auto with approval) auto
Self-heal on on
Prune on (with confirm) on
Sync windows block deploys during freeze/peak open
Digest updates PR-based (reviewed) ArgoCD Image Updater OK (dev)

Image promotion to prod is a reviewed PR in the GitOps repo, not an auto-bump — auditability over speed where it counts (08).


7. Multi-cluster: ArgoCD per cluster

Decision: an ArgoCD instance per cluster (prod has its own; nonprod its own), each bootstrapped by OpenTofu (10). No single ArgoCD holds credentials to both prod and nonprod → smaller blast radius and simpler per-cluster DR (rebuild a cluster + its ArgoCD independently). A central “hub” ArgoCD managing many clusters is the alternative (single pane of glass) — rejected for prod isolation, reconsidered only if cluster count explodes.


8. Tradeoffs / Alternatives / Scaling

Tradeoffs. GitOps adds a repo and a reconciler, and “deploy” becomes “merge a PR + wait for reconcile” (slightly more latency than kubectl apply) — paid back many times in auditability, drift control, and DR.

Alternatives considered.

Scaling concerns.

References