ADR-0021 — Dual resumable upload mechanism (tus + provider multipart)
- Status: Proposed
- Date: 2026-06-11
- Related: storage/05 uploads, ADR-0011, ADR-0017
V1 Freeze (2026-06-12): Proposed. V1 ships single-shot presigned PUT with a size cap; resumable multipart is the agreed next increment (P1/P2) but not committed to the freeze.
Context
Enterprise uploads are large and networks are flaky, so uploads must be resumable without re-sending received bytes. But “resumable” is a property achievable several ways, and the right mechanism differs between the smart (chunking) client and a browser/third-party client, and between direct-to-storage and proxied transfers (storage/05). We must not proxy bulk bytes through compute on the default path (ADR-0011).
Decision
Resumability is provided by three composed mechanisms, chosen by client/mode:
- Content-addressed resume (Mode A, smart client): committed chunks are durable;
resume = re-
NegotiateChunksand skip present ones. Content addressing gives resumability for free — no special protocol. - Provider multipart (Mode B1, direct): large whole-object uploads go straight to
the provider via presigned part URLs; resume via the provider’s
ListParts. No bytes through our compute. - tus (Mode B2, proxied fallback): browsers/CORS/firewall cases upload to the
gateway via the tus protocol (
HEAD→Upload-Offset,PATCHfrom offset); the gateway streams to the provider with a bounded buffer.
Abandoned uploads are reclaimed by staging TTL + provider lifecycle abort of stale multipart + a reconciler (storage/05 §5).
Consequences
Positive
- Each client gets the resumability mechanism that fits it; the high-value smart- client path needs no extra protocol (CAS does it).
- Default paths keep bytes off our compute (ADR-0011); only the tus fallback streams through, and it is rate-limited and memory-bounded.
- Standard, interoperable protocols (S3 multipart, tus) — clients/libraries exist.
Negative / costs
- Three mechanisms to implement and test (mitigated: they share the commit protocol and staging/GC).
- tus path consumes gateway bandwidth/CPU → bounded, throttled, fallback-only.
- Incomplete multipart accrues silent storage cost until aborted → mandatory lifecycle/reconciler cleanup.
Alternatives considered
- Proxy all uploads through the API with our own resume protocol: simplest client, but the R5/R12 compute-egress cost — rejected except as the bounded tus fallback.
- Provider multipart only: loses delta-sync / cross-version dedup (Mode A) and is awkward in browsers — insufficient alone.
- tus only: universal but proxies all bytes through compute — rejected as default.
Scaling
Presigned issuance is stateless (scales with control-plane replicas); the data plane scales on the provider (ADR-0011). The only compute-bound path (tus) is the rare fallback and is explicitly capacity-bounded.
References
- tus resumable upload protocol: https://tus.io/protocols/resumable-upload
- S3 multipart limits & abort lifecycle: https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html