ADR-0020 — Policy-driven placement & federation
- Status: Deferred
- Date: 2026-06-11
- Related: storage/09 federation & placement, storage/10 tiering, ADR-0005
V1 Freeze (2026-06-12): Deferred. V1 uses a single static storage provider; no placement engine, location map, or online migration. Re-opens with multi-region/residency or a second provider (P5+/NG9).
Context
BitVault federates many providers/regions/buckets/tiers (G4). Something must decide where each unit of data lives and the system must always know where it is, while honoring data residency (GDPR), cost (storage + egress vary wildly), durability class, and latency — without coupling a unit’s address to its location (which would make migration and provider exit impossible).
Decision
- A stateless Placement Service maps placement inputs (residency, durability,
cost, latency, capacity, tier) → a decision
{provider, region, bucket, tier, redundancy}. Residency is a hard, default-deny constraint; cost/latency are soft optimizations within the allowed set. - Location is recorded in metadata (Pack/Chunk Index, storage/08), per-pack by default (per-chunk only for standalone) — not derived from the content hash, so data can move without changing its address.
- Online migration (rebalance / cost / provider exit / region move) copies → hash-verifies → CAS-flips the location → deletes source after grace, with dual-sourced reads during cutover. Content addressing makes every copy provably correct.
Consequences
Positive
- Residency/sovereignty, cost arbitrage, optional cross-provider durability, latency placement, and no vendor lock-in (drain a provider entirely).
- Migration is safe (verify-by-hash) and non-disruptive (dual-read, CAS flip).
- Single-provider deployments use a trivial policy and pay none of the complexity.
Negative / costs
- A location map and placement decision on every write; migration machinery to build.
- Residency correctness is a compliance risk → enforced as a hard constraint + audited by a periodic conformance job.
Alternatives considered
- Hash-deterministic placement (
provider = f(hash)): no location map, but residency/tiering/migration/exit become impossible (can’t move data without changing its address). Rejected. - Static per-tenant provider (no engine): simplest; loses per-object tier/cost optimization. Kept as the degenerate policy for small/self-host.
- Provider-native multi-region only (e.g. S3 MRAP): ties us to one provider; defeats multi-cloud. Used opportunistically, not as the model.
Scaling
Per-pack location keeps the map ~100× smaller than per-chunk; placement is stateless/ cacheable; migrations are co-located, throttled, watermarked, resumable (a provider exit is a long verify-as-you-go campaign).