ADR-0029 — Progressive delivery with Argo Rollouts (canary + analysis)
- Status: Deferred
- Date: 2026-06-11
- Related: platform/04 deployment, ADR-0013, ADR-0028
V1 Freeze (2026-06-12): Deferred. No canary fleet in V1. Re-opens at P4.
Context
A bad deploy of a data-custody platform can corrupt or lose user data. Plain rolling updates reach 100% of traffic before metrics reveal a regression. We want deploys that expose a new version gradually and roll back automatically on objective signals.
Decision
Use Argo Rollouts for stateless, traffic-serving services (gateway, bitvaultd,
web):
- Canary with automated analysis: shift traffic in steps (e.g. 10→30→100%), and at each step query metrics via AnalysisTemplates (Prometheus/OTel: success rate, p99 latency vs NFR-3 SLOs, error-budget burn, saturation). A failing metric auto-aborts and restores the stable version with no human in the loop.
- Blue-green where instant cutover/rollback is preferable (e.g.
web). - Workers (no inbound traffic) use rolling updates — safe because consumers are idempotent (ADR-0006).
- Databases are never canaried; schema changes use expand/contract migrations as ArgoCD PreSync hooks (platform/04 §4).
Consequences
Positive
- Regressions are caught at 10% traffic, not 100%; rollback is automatic and fast.
- Deploys are gated on objective SLOs, not vibes.
- Integrates natively with ArgoCD (ADR-0028).
Negative / costs
- Requires reliable metrics + traffic shaping (ingress/mesh) and runs two versions during canary (transient extra cost). Worth it in prod; dev/staging use fast rolling.
- A noisy metric can cause false aborts → tune thresholds + inconclusive windows.
Alternatives considered
- Plain rolling everywhere: simplest; a bad version hits everyone before metrics react. Kept for workers/nonprod only.
- Recreate: downtime; only for singletons.
- Flagger: equivalent; Argo Rollouts chosen for ArgoCD integration.
Scaling
Short analysis windows + HPA bound the two-version cost; expand/contract + online index builds keep migrations zero-downtime at large table sizes (storage/08).