Storage
Purpose
Storage isolates the multi-cloud abstraction and the byte lifecycle from the rest of the system. No other module knows which object store is configured — they call Storage’s gRPC API and receive presigned URLs. Bytes themselves never traverse BitVault compute (ADR-0011).
Data owned
| Table | Purpose |
|---|---|
blobs |
Content-addressed blob records: content_hash (BLAKE3), provider, bucket, key, refcount, state |
multipart_uploads |
In-progress resumable uploads: provider upload ID, staging key, declared size, expiry |
| provider config | Object store connection credentials / endpoints (per tenant placement) |
Internal API
Storage.* gRPC methods:
| Method | Description |
|---|---|
Storage.PresignPut |
Issue a scoped presigned PUT URL for a staging key |
Storage.PresignGet |
Issue a scoped presigned GET URL for a committed blob |
Storage.HeadObject |
Verify size / etag / hash of a staging object |
Storage.CommitBlob |
Promote staging → committed; increment refcount |
Storage.DecrementRef |
Decrement refcount; schedule GC if zero |
Storage.InitMultipart |
Begin a multipart upload; return upload ID + part URLs |
Storage.CompleteMultipart |
Assemble parts into a committed blob |
Storage.AbortMultipart |
Cancel in-progress multipart; delete staging parts |
Storage.RunGC |
Reclaim orphaned staging blobs past TTL and committed blobs with refcount = 0 |
Provider abstraction
ADR-0005 defines a single Provider interface with adapters for each backend.
Swapping the provider is a configuration change; no domain logic changes.
| Provider | Use case |
|---|---|
| MinIO | Self-hosted deployments, local development (docker compose lite/standard) |
| Amazon S3 | Cloud deployments (AWS) |
| Cloudflare R2 | Cloud deployments requiring zero egress fees |
| Google Cloud Storage | Cloud deployments (GCP) |
| Azure Blob Storage | Cloud deployments (Azure) |
A conformance test suite validates every adapter against the full provider interface contract. All adapters must pass before a provider is considered supported.
:::tip
Adding a new storage provider requires only implementing the Provider interface and passing the conformance suite — no changes to the storage domain, commit protocol, or any other module.
:::
Presigned URL issuance
Presigned URLs are scoped:
- Exact object key (no wildcard).
- Short TTL (PUT: ~15 min; GET: configurable, typically 1 h).
- Response headers set on the presign request (e.g.
Content-Dispositionfor downloads).
The client transfers bytes directly to/from the object store. BitVault’s compute path handles only metadata.
Content addressing
ADR-0016 specifies BLAKE3 as the hash function for blob identity.
BLOB.content_hash(BLAKE3 hex) is the primary key.- Per-tenant deduplication (ADR-0018): uploading identical bytes within the same tenant increments
refcountinstead of storing twice. Cross-tenant dedup is deliberately excluded to prevent side-channel leakage. - Object key format:
/{tenant_id}/{content_hash}— tenant-prefixed for storage-side isolation.
Ref-counting and GC
stateDiagram-v2
[*] --> staging: PresignPut issued
staging --> committed: CommitBlob (HeadObject verify ✔)
staging --> [*]: GC (TTL expired, refcount = 0)
committed --> committed: refcount++ (new version references blob)
committed --> committed: refcount-- (version deleted)
committed --> [*]: GC (refcount = 0)
- Upload:
CommitBlobsets state =committed,refcount = 1(or increments for dedup hit). - Version delete:
DecrementRefdecrementsrefcount. - GC fires when
refcount = 0(ADR-0019):- Staging blobs past TTL (client uploaded but never committed).
- Committed blobs with zero references (all versions deleted).
- GC is a background worker; it never blocks the commit path.
Multipart / resumable uploads
ADR-0021 covers resumable uploads for large files:
- Client calls
InitMultipart→ server calls provider to begin multipart upload; returns upload ID + presigned URLs for each part. - Client uploads parts directly to object store (parallel, resumable on failure).
- Client calls
CompleteMultipart→ server calls provider to assemble parts →CommitBlob. - On abort or TTL expiry,
AbortMultipartcleans up partial parts from the provider.
Content-based chunking
ADR-0017 specifies content-based chunking for large files. Chunk boundaries are determined by the content (Rabin fingerprinting or similar), not by fixed byte offsets. This enables delta-sync efficiency: an edit near the start of a large file produces a small diff of changed chunks rather than re-uploading the whole file.
Storage placement federation
ADR-0020: storage placement is configurable per tenant and per file class. An enterprise tenant can have its blobs routed to a specific region or provider; the storage module resolves the placement policy and issues presigned URLs against the appropriate provider instance.