01 — Object Storage Abstraction

Topic: object storage abstraction. Refines ADR-0005. The seam between BitVault and the five providers (MinIO, S3, R2, GCS, Azure Blob). The design problem: expose enough power to be efficient, hide enough difference to stay portable, and never silently emulate a capability a provider lacks.


1. The interface (narrow, capability-flagged)

The abstraction is a small surface; everything else is built on top in BitVault code (chunking, dedup, packing are ours, not the provider’s).

Core (every adapter MUST implement): Put · Get(range) · Head · Delete · List(prefix, cursor) · Copy(intra-provider) · Presign(method, key, constraints, ttl) · multipart group (InitMultipart · PresignPart · CompleteMultipart · AbortMultipart · ListParts).

Capability-flagged (advertised, queried by callers, never emulated): ConditionalPut (write-once / If-None-Match) · ObjectLock (WORM) · ServerSideCopyCrossBucket · BatchDelete · StorageClasses[] (tier set) · ChecksumOnPut[] (CRC32C/SHA-256/CRC64) · PresignContentLengthRange · StrongListAfterWrite · RemoteTieringILM.

Principle: a missing capability is surfaced, never faked. If ConditionalPut is absent, the caller chooses a documented fallback (e.g. content-address makes writes idempotent anyway — §4), it does not get a slow read-modify-write pretending to be atomic.


2. Capability matrix (the differences that actually bite)

Capability MinIO S3 R2 GCS Azure Blob How we cope
Presigned URL (GET/PUT) ✓ (V4) ✓ (SAS) required of all; core to ADR-0011
Multipart upload ✓ (S3 API) ✓ (S3 API) ✓ (XML MPU / resumable / compose) ✓ (Put Block + Block List) adapter maps to native; see 05
Part-size limits S3-like 5 MiB–5 GiB, ≤10k parts S3-like differs block ≤4000 MiB, ≤50k blocks adapter exposes MaxPart,MaxParts
Conditional write-once (If-None-Match) partial ✓ (2024) ✓ (x-goog-if-generation-match:0) ✓ (If-None-Match:*) optimization only; correctness rests on CAS idempotency (§4)
Object Lock / WORM partial ✓ (retention/bucket lock) ✓ (immutability policy) used for compliance retention (07)
Storage classes / tiers 1 (+ remote ILM) Std/IA/Glacier… 1 (no archive) Std/Nearline/Coldline/Archive Hot/Cool/Cold/Archive tiering policy adapts to available classes (10)
Checksum on PUT CRC/SHA CRC32C/SHA-256 partial CRC32C/MD5 CRC64/MD5 belt-and-suspenders to our own hash (04)
Batch delete ✓ (1000/call) per-object / batch API per-object / batch GC batches where supported (11)
Strong read-after-write ✓ (since 2020) we still verify with Head (04)

The matrix is the artifact that keeps the abstraction honest. Every adapter ships a populated capability set + passes one conformance suite (ADR-0005) so “supported” means “tested”, not “claimed”.


3. Addressing scheme (bucket & key layout)

Keys are content-addressed and tenant-prefixed, designed for isolation, even request distribution, and cheap lifecycle scoping:

<tenant_id>/<class>/<hash[0:2]>/<hash[2:4]>/<hash>
                 │          └─ 2-level fan-out prefix: spreads load, bounds list size
                 └─ class ∈ { chunk, pack, manifest } (manifests may live in DB instead)

Tradeoffs. Prefix fan-out adds key length and a tiny indirection vs flat keys, but is mandatory at scale. Alternative (bucket-per-tenant) gives stronger blast-radius isolation and per-bucket policies but collapses past a few thousand tenants — reserved as an enterprise dedicated-deployment option, not the default. Scaling: with content-hash keys, write load is uniformly distributed for free; no manual partitioning needed.


4. Consistency & idempotency (how CAS rescues us from provider quirks)

Content addressing turns most consistency problems into non-problems:


5. Tradeoffs / Alternatives / Scaling (the abstraction itself)

Tradeoffs. A capability-flagged interface pushes some branching to callers (e.g. “tier to Archive” only where StorageClasses includes it). That is the honest cost of multi-cloud; the alternative (hidden emulation) produces silent correctness/perf cliffs.

Alternatives considered.

Scaling concerns.


References