09 — Storage Federation & Placement
Topic: storage federation. Federation = many providers, regions, buckets, and tiers presented as one logical store, with a Placement Service deciding where each unit of data lives and a location map that always knows where it is. Decision in ADR-0020.
1. What federation buys (and the one thing it requires)
Federation lets one BitVault deployment span MinIO + S3 + R2 + GCS + Azure across regions. It buys:
- Data residency / sovereignty — pin a tenant’s bytes to a region/jurisdiction (GDPR).
- Cost arbitrage — place cold data where storage is cheapest; serve where egress is cheapest.
- Durability — optional cross-provider redundancy (04).
- No vendor lock-in — drain a provider entirely (provider exit) because data is portable by content hash.
- Latency — place hot data near users.
The one thing it requires: a reliable location map. There is no global provider namespace; the system must always know which provider/bucket/key holds a given chunk/pack. That map is the Chunk/Pack Index (08) — federation is fundamentally a metadata capability, with the abstraction (01) as the execution layer.
2. Placement granularity: per-pack (mostly), per-chunk (when standalone)
Where does location live?
| Granularity | Location stored on | Pros | Cons |
|---|---|---|---|
| Per-chunk | every chunk row | maximal flexibility | location columns × 10^9 chunks = index bloat |
| Per-pack (default) | pack row; chunks inherit | one location row per ~hundreds of chunks | move = move whole pack |
| Per-tenant/bucket | tenant config | tiny metadata | coarse; can’t optimize hot/cold per object |
Decision: location is recorded per-pack for packed chunks (the common case), and per-chunk only for standalone (hot/large/unpacked) chunks. This keeps the location map ~100× smaller than per-chunk-everywhere while retaining object-level flexibility for the data that needs it. Tiering/migration then operate at pack granularity (move a pack = relocate all its chunks with one index update).
3. The Placement Service (policy → decision)
A stateless policy engine consulted when a new pack (or standalone chunk) needs a home, and by the migration worker when rebalancing.
flowchart TB
classDef in fill:#dbeafe,stroke:#1e40af,color:#111827;
classDef eng fill:#fde68a,stroke:#b45309,color:#111827;
classDef out fill:#bbf7d0,stroke:#15803d,color:#111827;
subgraph INPUTS["Placement inputs"]
res["Tenant residency / sovereignty"]:::in
dur["Durability class (1× / cross-provider)"]:::in
cost["Provider cost: $/GB store + $/GB egress"]:::in
lat["User region / latency"]:::in
cap["Capacity & provider health"]:::in
tier["Target tier (hot/warm/cold)"]:::in
end
eng["Placement Service<br/>(policy evaluation, default-deny on residency)"]:::eng
INPUTS --> eng
eng --> dec["Decision: {provider, region, bucket, tier, redundancy}"]:::out
dec --> idx["Recorded in Pack/Chunk Index"]:::out
- Residency is a hard constraint (default-deny): a tenant pinned to
eunever has bytes placed outsideeu, regardless of cost. Compliance trumps optimization. - Cost & latency are soft optimizations within the residency-allowed set.
- Placement policy is config/data owned by Admin/Platform; the engine is stateless and cacheable.
4. Read routing & migration (content hash makes both safe)
Read routing: download resolve (06) reads the location from the index → picks the provider adapter → presigns. If cross-provider redundancy exists, route to the cheapest-egress / lowest-latency copy.
Online migration (rebalance, cost-optimize, provider exit, region move):
sequenceDiagram
autonumber
participant M as Migration Worker
participant SRC as Source provider
participant DST as Dest provider
participant IX as Pack Index
M->>SRC: read pack bytes
M->>DST: write pack (same content)
M->>DST: Head + BLAKE3 verify == content hash
M->>IX: CAS location SRC→DST (dual-listed during cutover)
Note over M,IX: reads now served from DST — SRC kept until grace
M->>SRC: delete old pack (after grace, GC)
Migration is safe because content is addressed by hash: the destination copy is provably the same bytes (verify by hash), and the index flip is a CAS. Reads can be dual-sourced during cutover (try DST, fall back to SRC) so migration never causes a read miss. Provider exit = migrate every pack off a provider, then remove it.
5. Egress & data-gravity (the cost that dominates at scale)
- Never move bulk bytes cross-provider/cross-region on the hot path. Reads are served from a co-located/cheapest copy; maintenance (scrub, pack, migrate) runs co-located with the data (in-region/in-cluster compute) to minimize egress.
- Egress asymmetry: ingress is usually free, egress is expensive and varies wildly (R2 notably zero-egress). Placement weights egress price for read-heavy data and storage price for cold data.
- CDN offloads repeat egress entirely (06 §6).
6. Tradeoffs / Alternatives / Scaling
Tradeoffs. Federation adds a placement decision and a location map to every write, and migration machinery. The payoff (residency, cost, durability, no lock-in) is exactly the multi-cloud value proposition (G4); a single-provider deployment simply uses a trivial one-option policy and pays none of the complexity.
Alternatives considered.
- Static per-tenant provider assignment (no engine): simplest; loses per-object cost/tier optimization and easy rebalancing. Kept as the degenerate policy for small/self-host deployments.
- Hash-based deterministic placement (provider = f(hash)): needs no location map but makes residency, tiering, migration, and provider exit nearly impossible (you can’t move data without changing its address). Rejected — the location-map approach is strictly more flexible and migration-safe.
- Provider-managed multi-region (e.g. S3 MRAP) only: ties us to one provider’s multi-region story; defeats multi-cloud. Used opportunistically, not as the model.
Scaling concerns.
- Location-map size → per-pack granularity (§2) keeps it ~100× smaller than per-chunk.
- Migration throughput must be co-located and throttled; at PB scale a provider exit is a long-running, resumable, verify-as-you-go campaign (watermarked like GC).
- Placement decisions are cheap/stateless → no scaling concern; policy is cached.
- Residency correctness is a compliance risk → enforced as a hard constraint with audit, and verified by a periodic job that asserts no pack violates its tenant’s residency (a test, not a hope).
References
- AWS S3 cross-region / Multi-Region Access Points: https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiRegionAccessPoints.html
- Cloudflare R2 zero egress: https://developers.cloudflare.com/r2/
- GDPR data residency considerations: https://gdpr.eu/