06 — GitOps & ArgoCD
Task 6: design GitOps workflows. Pull-based continuous delivery: ArgoCD reconciles each cluster toward the desired state in Git, detects drift, self-heals, and makes rollback a
git revert. Decision in ADR-0028.
1. Why pull-based GitOps
| Property | Payoff |
|---|---|
| Git = source of truth | every change is reviewed, audited, attributable, revertable |
| Pull, not push | CI never holds cluster credentials; the agent runs in the cluster and pulls — far smaller attack surface (07) |
| Drift detection + self-heal | manual kubectl edit is reverted; the cluster cannot silently diverge from Git |
| Declarative rebuild | a lost cluster is re-created from IaC + Git — the DR superpower (11) |
2. The reconcile loop
flowchart LR
classDef g fill:#fbcfe8,stroke:#be185d,color:#111827;
classDef a fill:#bbf7d0,stroke:#15803d,color:#111827;
classDef c fill:#c7d2fe,stroke:#3730a3,color:#111827;
git[("GitOps repo<br/>desired state")]:::g
argo["ArgoCD controller (in-cluster)"]:::a
live["Live cluster state"]:::c
argo -->|"observe desired"| git
argo -->|"observe live"| live
argo -->|"diff → sync (apply)"| live
live -->|"drift?"| argo
argo -->|"self-heal / prune"| live
ArgoCD continuously diffs desired (Git) vs live (cluster); on difference it syncs (or, with self-heal, reverts unauthorized live changes). The image digest in Git (01) is what changes to trigger a deploy.
3. App-of-apps + ApplicationSets
flowchart TB
classDef r fill:#fde68a,stroke:#b45309,color:#111827;
classDef p fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef s fill:#bbf7d0,stroke:#15803d,color:#111827;
root["root Application (app-of-apps, bootstrapped by OpenTofu)"]:::r
root --> proj["AppProjects (guardrails: repos/clusters/namespaces)"]:::p
root --> plat["platform ApplicationSet<br/>(eso, cert-manager, ingress, velero, monitoring)"]:::p
root --> appset["app ApplicationSet (generators)"]:::p
appset --> g1["gateway @ dev"]:::s
appset --> g2["gateway @ staging"]:::s
appset --> g3["gateway @ prod"]:::s
appset --> prgen["PR generator → pr-123 preview"]:::s
- ApplicationSet is the evolution of app-of-apps: one manifest generates the matrix of (service × environment × cluster) Applications — e.g. Git generator for services/envs, Cluster generator for clusters, PR generator for previews (03). 20 services × 3 envs from a handful of generators, not 60 hand-written files.
- AppProjects constrain each Application’s allowed source repos, destination clusters, and namespaces — the GitOps guardrail (02).
4. GitOps repo layout (directories, not branches)
bitvault-gitops/
├── bootstrap/ # root app-of-apps (what OpenTofu points ArgoCD at)
├── projects/ # AppProjects
├── applicationsets/ # generators (services, previews)
├── platform/ # eso, cert-manager, ingress, velero, monitoring (as Apps)
└── envs/
├── dev/<service>/ # values + pinned image digest
├── staging/<service>/
└── prod/<service>/
Environments are directories in one branch, promoted by PR — not long-lived env branches (an anti-pattern, 03). ArgoCD resources live separately from application Kubernetes manifests (industry guidance).
5. Sync ordering: waves + progressive syncs
- Sync waves order resources within an Application: negative = namespaces/ CRDs/RBAC/operators, 0 = data + DB-migration PreSync hooks, positive = app workloads (04 §3). A ~2 s inter-wave delay lets controllers react.
- Progressive syncs order Applications within an ApplicationSet (e.g. roll dev before staging before prod across the set).
6. Sync policy & guardrails
| Setting | Prod | Nonprod |
|---|---|---|
| Auto-sync | manual/gated (or auto with approval) | auto |
| Self-heal | on | on |
| Prune | on (with confirm) | on |
| Sync windows | block deploys during freeze/peak | open |
| Digest updates | PR-based (reviewed) | ArgoCD Image Updater OK (dev) |
Image promotion to prod is a reviewed PR in the GitOps repo, not an auto-bump — auditability over speed where it counts (08).
7. Multi-cluster: ArgoCD per cluster
Decision: an ArgoCD instance per cluster (prod has its own; nonprod its own), each bootstrapped by OpenTofu (10). No single ArgoCD holds credentials to both prod and nonprod → smaller blast radius and simpler per-cluster DR (rebuild a cluster + its ArgoCD independently). A central “hub” ArgoCD managing many clusters is the alternative (single pane of glass) — rejected for prod isolation, reconsidered only if cluster count explodes.
8. Tradeoffs / Alternatives / Scaling
Tradeoffs. GitOps adds a repo and a reconciler, and “deploy” becomes “merge a PR +
wait for reconcile” (slightly more latency than kubectl apply) — paid back many times
in auditability, drift control, and DR.
Alternatives considered.
- Push CD (CI runs
kubectl/helm upgrade): simple but CI holds cluster creds (big attack surface), no drift detection, rollback is ad hoc. Rejected (ADR-0028). - Flux instead of ArgoCD: excellent, equivalent GitOps; ArgoCD chosen for its UI, ApplicationSets, AppProjects, and Argo Rollouts synergy (04).
- Hub ArgoCD for all clusters: single pane but cross-env cred blast radius; rejected for prod.
Scaling concerns.
- Many apps/clusters → ApplicationSet generators render the matrix; ArgoCD shards controllers if needed.
- Repo as bottleneck → mono-GitOps-repo is fine to large scale; split per-team with AppProjects if contention appears.
- Reconcile load → tune resync intervals; use webhooks (Git → ArgoCD) for instant sync instead of polling.
References
- ArgoCD ApplicationSets: https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/
- App-of-apps / cluster bootstrapping: https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/
- Sync waves: https://argo-cd.readthedocs.io/en/stable/user-guide/sync-waves/