05 — Uploads: Multipart vs Chunked vs Resumable

Topics: multipart uploads, chunked uploads, resumable uploads. Decision in ADR-0021. The brief lists these three as if peers; they are not. Disambiguating them is the first job of this doc.


1. Three words, three different layers (stop conflating them)

Term What it actually is Layer Set by
Chunked splitting content into content-defined dedup units (~1 MiB) logical / data model BitVault CDC (02)
Multipart a provider transfer mechanism for one large object, in parts ≥5 MiB physical / transfer provider (S3 multipart)
Resumable a property: survive interruption without re-sending received bytes behavior achieved via tracking (parts, offset, or committed chunks)

A single upload can be all three at once: a 10 GiB file is chunked into ~10k dedup chunks (logical), transferred to the provider via multipart parts of 64 MiB (physical), and the whole thing is resumable because we persist progress. They are orthogonal. The rest of this doc designs each layer and how they compose.


2. Upload modes (BitVault supports two; pick by client capability)

flowchart TB
    classDef d fill:#fde68a,stroke:#b45309,color:#111827;
    start{"Client type?"}:::d
    start -->|"smart (CLI, sync, mobile)"| smart["Mode A: client-side chunked + dedup<br/>(delta sync; upload only new chunks)"]:::d
    start -->|"simple (browser, 3rd-party, large single file)"| big["Mode B: whole-object transfer<br/>(provider multipart OR tus) + async server-side chunking"]:::d
    smart --> commit["Commit manifest (verify → write → outbox)"]:::d
    big --> commit

Mode A — client-side chunked upload (the dedup/delta-sync path)

The high-value path: CDC + dedup happen on the client, so unchanged data never leaves the device.

  1. Client CDC-chunks, hashes (BLAKE3), NegotiateChunks → server returns missing subset (tenant-scoped, 03).
  2. Client uploads only missing chunks, direct-to-storage via presigned PUTs.
    • Small chunks → one presigned PUT each (batched issuance).
    • A large chunk (rare with 4 MiB cap) or a batch can use provider multipart.
  3. Commit(manifest) → server verifies all chunks present + checksums (SI-1), writes manifest, refs++, emits outbox event.

Resumability is intrinsic: committed chunks are durable; an interrupted upload resumes by re-running NegotiateChunks — already-uploaded chunks are now “present” and skipped. No special resume protocol needed; content addressing gives resumability for free.

Mode B — whole-object transfer (the simple/large-file path)

For clients that can’t or shouldn’t chunk (browsers, third-party tools, opaque large files). The object is transferred whole, then chunked/deduped server-side, asynchronously.

Two transfer mechanisms, chosen by environment:

After the object lands in staging, an ingest worker CDC-chunks it, dedups against the tenant index, writes the manifest, and schedules packing. The user’s file is available immediately (from the whole staged object); dedup/packing settle asynchronously.


3. Provider multipart: the limits that shape part sizing

S3 (and S3-compatible MinIO/R2) multipart hard limits — load-bearing:

Limit Value
Min part size 5 MiB (last part may be smaller)
Max part size 5 GiB
Max parts per upload 10,000
Max object size 5 TiB

Part-sizing math (adaptive): part_size = clamp(ceil(file_size / 10000), 5 MiB, 5 GiB), rounded up to a nice boundary. A 5 TiB file needs ≥512 MiB parts to fit in 10k parts; a 100 MiB file uses 8–16 MiB parts for good retry granularity. The adapter exposes provider-specific caps; Placement picks a provider that can hold the object (09). Azure uses block blobs (≤50k blocks, ≤~4000 MiB each) — the adapter maps partblock.

Chunk ≠ part. Our 1 MiB dedup chunks are far below the 5 MiB multipart minimum. That is fine and expected: multipart parts are a transfer batching of the staged whole object (Mode B) or of a large blob; dedup chunks are logical units extracted from content. In Mode A we usually PUT chunks directly (no multipart) and pack them later.


4. The commit protocol (defeats dual-write — SI-1)

Every mode ends in the same commit, the heart of correctness (refines 06 §4 high-level upload):

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Storage Coordinator
    participant O as Object Store
    participant DB as Postgres (Chunk/Manifest Index + Outbox)
    Note over C,O: bytes already in STAGING (Mode A chunks or Mode B object)
    C->>S: Commit(version, manifest|object_ref)
    S->>O: Head each chunk/part — size + checksum
    O-->>S: verified ✔
    alt all present & valid
        Note over S,DB: BEGIN TX<br/>upsert chunks (ref++ via edge rows)<br/>insert manifest<br/>insert outbox(NodeChanged/BlobCommitted)<br/>COMMIT
        S-->>C: 200 committed
    else missing/corrupt
        S-->>C: 409 — re-upload listed chunks/parts
    end

Invariant: a manifest is durable only after all its bytes are verified present. A crash before commit leaves only staging bytes (refcount 0) → reclaimed by GC (11). No dangling references, ever.


5. Resumability — how each path recovers

Path Progress tracked by Resume action Cleanup of abandoned
Mode A (chunked) committed chunks in tenant index re-NegotiateChunks; skip present staging chunks ref=0 → GC after TTL
Mode B1 (provider multipart) provider-side uploaded parts ListParts → upload missing parts → complete provider lifecycle aborts stale MPU (e.g. 7d) + our reconcile
Mode B2 (tus) server Upload-Offset HEADPATCH from offset staging object TTL → GC

Abandoned-upload reclamation is mandatory at scale: incomplete multipart uploads accrue storage cost silently on S3 until aborted. We set provider lifecycle rules to auto-abort stale MPUs and run a reconciler that aborts/deletes staging older than the TTL — belt and suspenders, because lifecycle rules differ per provider.


6. Backpressure, ordering & integrity on the upload path


7. Tradeoffs / Alternatives / Scaling

Tradeoffs.

Alternatives considered.

Scaling concerns.

References