Storage

Purpose

Storage isolates the multi-cloud abstraction and the byte lifecycle from the rest of the system. No other module knows which object store is configured — they call Storage’s gRPC API and receive presigned URLs. Bytes themselves never traverse BitVault compute (ADR-0011).

Data owned

Table Purpose
blobs Content-addressed blob records: content_hash (BLAKE3), provider, bucket, key, refcount, state
multipart_uploads In-progress resumable uploads: provider upload ID, staging key, declared size, expiry
provider config Object store connection credentials / endpoints (per tenant placement)

Internal API

Storage.* gRPC methods:

Method Description
Storage.PresignPut Issue a scoped presigned PUT URL for a staging key
Storage.PresignGet Issue a scoped presigned GET URL for a committed blob
Storage.HeadObject Verify size / etag / hash of a staging object
Storage.CommitBlob Promote staging → committed; increment refcount
Storage.DecrementRef Decrement refcount; schedule GC if zero
Storage.InitMultipart Begin a multipart upload; return upload ID + part URLs
Storage.CompleteMultipart Assemble parts into a committed blob
Storage.AbortMultipart Cancel in-progress multipart; delete staging parts
Storage.RunGC Reclaim orphaned staging blobs past TTL and committed blobs with refcount = 0

Provider abstraction

ADR-0005 defines a single Provider interface with adapters for each backend. Swapping the provider is a configuration change; no domain logic changes.

Provider Use case
MinIO Self-hosted deployments, local development (docker compose lite/standard)
Amazon S3 Cloud deployments (AWS)
Cloudflare R2 Cloud deployments requiring zero egress fees
Google Cloud Storage Cloud deployments (GCP)
Azure Blob Storage Cloud deployments (Azure)

A conformance test suite validates every adapter against the full provider interface contract. All adapters must pass before a provider is considered supported.

:::tip Adding a new storage provider requires only implementing the Provider interface and passing the conformance suite — no changes to the storage domain, commit protocol, or any other module. :::

Presigned URL issuance

Presigned URLs are scoped:

The client transfers bytes directly to/from the object store. BitVault’s compute path handles only metadata.

Content addressing

ADR-0016 specifies BLAKE3 as the hash function for blob identity.

Ref-counting and GC

stateDiagram-v2
    [*] --> staging: PresignPut issued
    staging --> committed: CommitBlob (HeadObject verify ✔)
    staging --> [*]: GC (TTL expired, refcount = 0)
    committed --> committed: refcount++ (new version references blob)
    committed --> committed: refcount-- (version deleted)
    committed --> [*]: GC (refcount = 0)

Multipart / resumable uploads

ADR-0021 covers resumable uploads for large files:

  1. Client calls InitMultipart → server calls provider to begin multipart upload; returns upload ID + presigned URLs for each part.
  2. Client uploads parts directly to object store (parallel, resumable on failure).
  3. Client calls CompleteMultipart → server calls provider to assemble parts → CommitBlob.
  4. On abort or TTL expiry, AbortMultipart cleans up partial parts from the provider.

Content-based chunking

ADR-0017 specifies content-based chunking for large files. Chunk boundaries are determined by the content (Rabin fingerprinting or similar), not by fixed byte offsets. This enables delta-sync efficiency: an edit near the start of a large file produces a small diff of changed chunks rather than re-uploading the whole file.

Storage placement federation

ADR-0020: storage placement is configurable per tenant and per file class. An enterprise tenant can have its blobs routed to a specific region or provider; the storage module resolves the placement policy and issues presigned URLs against the appropriate provider instance.