GitOps with ArgoCD

BitVault uses a pull-based GitOps model (ADR-0028). The cluster’s desired state lives in Git. ArgoCD continuously reconciles the live cluster against it. GitHub Actions never holds a kubeconfig — CI’s authority ends at the registry and the GitOps repository.

The Deployment Loop

flowchart LR
    push["Code push\nto main / tag"]
    ci["GitHub Actions\nbuild + sign\n(OIDC auth)"]
    registry["OCI Registry\n(signed image + SBOM\n+ provenance)"]
    pr["GitOps PR\n(digest bump)"]
    gitops["GitOps Repo\n(desired state)"]
    argocd["ArgoCD\n(pull + render)"]
    cluster["Kubernetes\nCluster"]

    push --> ci --> registry
    ci --> pr --> gitops
    argocd -->|"watches"| gitops
    argocd -->|"applies"| cluster

CI’s scope: build → sign → push → open PR. Cluster state changes are exclusively ArgoCD’s responsibility.

ArgoCD Application Structure

BitVault manages cluster resources through two ArgoCD object types:

Application — one per environment per service group (or one per environment for the monolith phase):

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: bitvault-prod
  namespace: argocd
spec:
  project: bitvault
  source:
    repoURL: https://github.com/bitvault/gitops
    targetRevision: main
    path: envs/prod
    helm:
      valueFiles:
        - values.prod.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: bitvault-prod
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

ApplicationSet — used for matrix generation across environments and services, and for ephemeral preview environments:

Sync waves control ordering within a single sync operation (see Sync Waves below).

Sync Waves

Resources within an ArgoCD Application sync in wave order. Resources in the same wave sync in parallel; the next wave does not start until all resources in the current wave are Healthy.

Wave Resource Types Rationale
-3 Namespaces, CRDs Must exist before anything else can be created
-2 RBAC (ClusterRole, RoleBinding), ServiceAccounts Required by operators and workloads
-1 Operators (external-secrets, cert-manager, KEDA controllers), ExternalSecret resources Must be running before app config secrets can be materialized
0 Data services (PostgreSQL, Redis, NATS), DB migration PreSync hook Jobs Migration Job runs as a PreSync hook and must succeed before wave 0 resources are created
1 bitvaultd, bitvault-worker, bitvault-web Deployments + Services App workloads start after data services are ready and migrations complete
2 Ingress / HTTPRoute, HPA, KEDA ScaledObjects, PodDisruptionBudgets Traffic routing and scaling activated after workloads are healthy

The DB migration Job uses the ArgoCD PreSync hook annotation, ensuring it runs (and passes) before any wave-0 resources are touched:

annotations:
  argocd.argoproj.io/hook: PreSync
  argocd.argoproj.io/hook-delete-policy: HookSucceeded

A failed migration Job blocks the entire sync. This is intentional — a failed migration must not be followed by a code deployment.

Promotion Flow

flowchart TD
    ci["CI builds signed\ndigest D on main merge"]
    gpr_dev["GitOps PR:\nbump digest in envs/dev/\n(auto-merge bot)"]
    dev["ArgoCD auto-syncs\ndev cluster"]
    gpr_staging["GitOps PR:\nbump digest in envs/staging/\n(requires PR approval)"]
    staging["ArgoCD syncs\nstaging cluster\n(abbreviated canary)"]
    gpr_prod["GitOps PR:\nbump digest in envs/prod/\n(requires PR approval\n+ manual gate)"]
    prod["ArgoCD syncs\nprod cluster\n(full canary + analysis)"]

    ci --> gpr_dev --> dev --> gpr_staging
    gpr_staging --> staging --> gpr_prod --> prod

Drift Detection

ArgoCD continuously compares the desired state (rendered Helm chart from the GitOps repo commit) against the live state of the cluster:

Direct kubectl edits to production resources are therefore both reverted and alerted on. This is a feature, not a bug — the GitOps repo is the single source of truth.

Rollback

Rollback is a Git operation, not a kubectl operation:

  1. Identify the last-known-good commit in the GitOps repo (the one with the previous image.digest value).
  2. Open a PR reverting the digest bump commit, or directly revert and push to the environment branch.
  3. ArgoCD detects the change and re-syncs the cluster to the reverted digest.
  4. The previously running pods (which are still in the registry by their immutable digest) are redeployed.

For canary-in-progress failures, Argo Rollouts aborts and rolls back automatically without any manual Git revert. See Rollout Strategy for details.

:::note There is no helm rollback in this workflow. The Helm release history in the cluster is not the source of truth — the GitOps repo commit history is. A helm rollback would be overwritten by ArgoCD’s next reconciliation cycle anyway. :::