BitVault Platform Engineering — Design
Audience: platform / DevOps / SRE engineers. Scope: how BitVault is built, shipped, run, and recovered — containers, Kubernetes, IaC, GitOps, CI/CD, secrets, releases, backup, and disaster recovery — at production grade, for the SaaS deployment, with self-host parity where it matters.
Builds on ADR-0012 tiered packaging, ADR-0013 observability, ADR-0014 KMS, the storage and sync subsystems, and the high-level deployment topology.
No implementation code — architecture and documentation only. Each decision carries Tradeoffs / Alternatives / Scaling; the contested ones are ADRs 0028–0034.
0. Reading order & task map
| # | Doc | Task(s) |
|---|---|---|
| 01 | Containerization (Docker) | Docker |
| 02 | Kubernetes namespaces | 1. namespaces |
| 03 | Environment strategy | 3. environments |
| 04 | Deployment strategy | 2. deployment |
| 05 | Helm & config | Helm |
| 06 | GitOps & ArgoCD | 6. GitOps |
| 07 | CI/CD & image pipelines | 5. image builds, 7. CI/CD |
| 08 | Release workflows | 8. releases |
| 09 | Secrets management | 4. secrets |
| 10 | Infrastructure (Terraform/OpenTofu) | IaC |
| 11 | Disaster recovery | 9. DR |
| 12 | Backup strategies | 10. backups |
1. Platform principles (the rules)
- Git is the source of truth. Desired state — infra, manifests, config — lives in Git. The cluster is reconciled toward Git, never mutated out of band.
- CI pushes artifacts; CD pulls state. CI (GitHub Actions) builds, tests, scans, signs, and pushes immutable images — then stops. Deployment is pull-based GitOps (ArgoCD) reconciling from Git. CI never holds a kubeconfig (ADR-0028).
- IaC provisions the substrate; GitOps runs everything inside. OpenTofu creates clusters, networks, buckets, KMS, IAM — and bootstraps ArgoCD. ArgoCD then owns all in-cluster resources. A clean, documented boundary (ADR-0031).
- Immutable, promoted-by-digest artifacts. An image built once is promoted by digest through environments — never rebuilt per environment (ADR-0032/0034).
- No secrets in Git, ever. References in Git; secret material in a KMS/vault, synced by External Secrets Operator (ADR-0030).
- Progressive delivery with automated analysis. Canary with metric-based auto-promote/auto-rollback (Argo Rollouts), not big-bang deploys (ADR-0029).
- Clusters are cattle. Everything to rebuild a cluster is in IaC + Git, so DR is “re-provision + re-sync + restore data,” not heroics (ADR-0033). This is the single biggest operational payoff of the whole design.
- Supply-chain security by default. Keyless signing (cosign/OIDC), SBOMs, SLSA provenance, image scanning, admission-time verification (ADR-0032).
- Self-host parity. The same Helm charts power SaaS and self-host; self-host uses a subset (Compose or a lightweight cluster), per the tiered-packaging ADR-0012.
2. The delivery topology (one picture)
flowchart TB
classDef dev fill:#dbeafe,stroke:#1e40af,color:#111827;
classDef ci fill:#fde68a,stroke:#b45309,color:#111827;
classDef git fill:#fbcfe8,stroke:#be185d,color:#111827;
classDef cd fill:#bbf7d0,stroke:#15803d,color:#111827;
classDef infra fill:#c7d2fe,stroke:#3730a3,color:#111827;
dev["Developer → PR → merge"]:::dev
subgraph APP["App monorepo (ADR-0002)"]
code["code · Dockerfiles · Helm chart source"]:::dev
end
subgraph CI["GitHub Actions (CI = push)"]
build["build · test · scan · SBOM · sign (cosign/OIDC)"]:::ci
push["push image (by digest) → registry"]:::ci
bump["open PR: bump image digest in GitOps repo"]:::ci
end
reg[("Container registry<br/>signed images + SBOM")]:::ci
subgraph GITOPS["GitOps repo (desired state)"]
apps["ArgoCD Applications / ApplicationSets"]:::git
vals["per-env Helm values + image digests"]:::git
end
subgraph CD["ArgoCD (CD = pull)"]
argo["reconcile · sync waves · drift/self-heal"]:::cd
ro["Argo Rollouts: canary + analysis"]:::cd
end
subgraph CL["Kubernetes clusters"]
np["nonprod cluster (dev + staging + previews)"]:::cd
prod["prod cluster"]:::cd
end
tofu["OpenTofu (IaC): clusters · VPC · buckets · KMS · IAM · bootstrap ArgoCD"]:::infra
cloud[("Cloud substrate")]:::infra
dev --> code --> build --> push --> reg
build --> bump --> vals
apps & vals --> argo --> ro
ro --> np & prod
argo -. reads .-> reg
tofu --> cloud --> np & prod
tofu -. installs .-> argo
The split is the whole design: the left half (CI) produces a signed artifact and a Git change; the right half (CD/GitOps) converges the cluster to Git. They meet only in Git and the registry — never via a deploy credential handed to CI.
3. Repository strategy (three concerns, separated)
| Repo / area | Holds | Who writes | Who reads |
|---|---|---|---|
bitvault (app monorepo, ADR-0002) |
source, Dockerfiles, Helm chart source, CI workflows | engineers (PRs) | CI |
bitvault-gitops (config repo) |
ArgoCD Application/ApplicationSet, per-env values, pinned image digests, ExternalSecret refs |
CI (digest bumps), platform (config PRs) | ArgoCD |
bitvault-infra (IaC) |
OpenTofu modules + per-env stacks, remote-state config | platform (PRs + plan/apply) | OpenTofu |
Keep ArgoCD resources and Kubernetes app manifests in separate repos/areas (industry guidance: do not mix the two). The GitOps repo is the contract between CI and CD; the app repo is where humans work; the IaC repo is the substrate. Promotion = a PR in the GitOps repo (ADR-0034).
4. Environment & cluster topology
flowchart LR
classDef c fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef n fill:#bbf7d0,stroke:#15803d,color:#111827;
subgraph NP["Nonprod cluster (cost-shared)"]
dev["ns: dev"]:::n
stg["ns: staging"]:::n
prev["ns: pr-123 (ephemeral preview)"]:::n
sysn["ns: platform (argocd, eso, ingress, monitoring)"]:::n
end
subgraph PR["Prod cluster (isolated blast radius)"]
app["ns: bitvault"]:::n
data["ns: bitvault-data (operators)"]:::n
sysp["ns: platform"]:::n
end
NP:::c
PR:::c
- Prod is a dedicated cluster (blast-radius isolation, stricter RBAC/compliance, no noisy neighbors). dev + staging + per-PR previews share a nonprod cluster (cost). Namespaces separate concerns within each (02).
- Tenant isolation is application-level (Postgres RLS, ADR-0007), not namespace-per-tenant — millions of tenants make that impossible. Namespaces isolate environments and platform concerns, not customers.
- Each cluster is bootstrapped by OpenTofu (cluster + ArgoCD), then ArgoCD’s app-of-apps takes over (06, 10).
5. The platform stack (what runs in every cluster)
| Concern | Component |
|---|---|
| GitOps CD | ArgoCD + Argo Rollouts (06, 04) |
| Ingress / TLS | ingress controller + cert-manager (ACME) |
| Secrets | External Secrets Operator ← cloud KMS/Vault (09) |
| Stateful data | operators: CloudNativePG (Postgres), Redis, NATS, OpenSearch (12) |
| Observability | OpenTelemetry Collector + Prometheus/Tempo/Loki (or vendor) (ADR-0013) |
| Backup/DR | Velero + CloudNativePG PITR (11, 12) |
| Policy/security | PodSecurity admission, NetworkPolicies, image-signature verification (02, 07) |
| Autoscaling | HPA (+ optional KEDA for queue depth), Cluster Autoscaler/Karpenter |
Related ADRs
| ADR | Decision |
|---|---|
| 0028 | Pull-based GitOps with ArgoCD; separate config repo |
| 0029 | Progressive delivery with Argo Rollouts (canary + analysis) |
| 0030 | External Secrets Operator + cloud KMS |
| 0031 | IaC with OpenTofu; IaC/GitOps boundary |
| 0032 | GitHub Actions + OIDC keyless + supply-chain security |
| 0033 | Backup & DR (Velero + PITR; RTO/RPO targets) |
| 0034 | Environment & promotion model (promote by digest via PR) |
Inherited: 0001, 0012, 0013, 0014.
References (research grounding)
- ArgoCD ApplicationSets / app-of-apps: https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/ · https://argo-cd.readthedocs.io/en/stable/operator-manual/cluster-bootstrapping/
- ArgoCD sync waves: https://argo-cd.readthedocs.io/en/stable/user-guide/sync-waves/
- Argo Rollouts (canary/analysis): https://argo-rollouts.readthedocs.io/
- External Secrets Operator: https://external-secrets.io/
- Sigstore cosign keyless + GitHub OIDC: https://docs.sigstore.dev/ · https://docs.github.com/actions/deployment/security-hardening-your-deployments
- SLSA: https://slsa.dev/ · Velero: https://velero.io/ · CloudNativePG: https://cloudnative-pg.io/ · OpenTofu: https://opentofu.org/