06 — Downloads & Reconstruction
Topic: download flows. Downloads are where the chunked/deduped/packed storage model meets the user’s expectation of “give me my file, fast.” The central tension: storage is optimized for dedup (many small chunks, possibly packed), but download wants few, large, sequential reads. This doc resolves it.
1. Resolution pipeline (version → bytes)
sequenceDiagram
autonumber
participant C as Client
participant GW as Gateway
participant SH as Sharing/Authz
participant S as Storage Coordinator
participant DB as Manifest + Chunk/Pack Index
participant O as Object Store
C->>GW: GET /v1/files/{id}/content (Range optional)
GW->>SH: CheckAccess(principal, node, read)
SH-->>GW: allow
GW->>S: ResolveDownload(version, range?)
S->>DB: content_hash → manifest → chunk refs → locations + tiers
DB-->>S: [{chunk, pack_id|object, offset, len, tier}]
alt any chunk on cold tier
S-->>C: 202 Accepted (rehydrating) — notify when ready (§5)
else all online
S-->>C: plan = presigned GET(s) + ranges (or single URL)
C->>O: GET bytes (direct, presigned, Range)
O-->>C: bytes
C->>C: BLAKE3-verify each chunk/range ([04])
end
Authz is resolved before any URL is issued; presigned URLs are scoped + short- TTL (ADR-0011). Bytes flow client ⇄ provider, not through our compute.
2. Three download shapes (chosen by how the file was stored + who’s asking)
| File / client | How served | Reads |
|---|---|---|
| Small / whole-stored (≤ chunk threshold, 05) | single presigned GET | 1 |
| Large, chunked → smart client (CLI/sync/mobile) | client fetches only missing chunks, reconstructs locally per manifest | N (deduped) |
| Large, chunked → browser / simple | range reads over packs, reassembled (see §4) | N or streamed |
The small-file fast path is why we don’t chunk small files (05 §7): the majority of downloads become a single direct GET with zero reconstruction.
3. Reading packed chunks (range reads)
When a chunk lives inside a ~1 GiB pack (02,
11), we don’t download the pack — we issue a presigned
GET on the pack object and the client sends a Range: bytes=offset-(offset+len-1)
header for exactly that chunk. The Pack Index supplies (pack_id, offset, len).
- Coalescing: if several needed chunks are contiguous within the same pack (common, because the packer groups chunks of the same object), they are fetched in one ranged GET spanning them — turning N requests into 1. This is a major download-efficiency lever and a reason packing helps reads, not just storage.
- Verification: BLAKE3 verified streaming lets the client verify each chunk’s range independently even within a coalesced read (04).
4. The reconstruction-location decision (browser problem)
Reassembling many chunks is easy for a smart client (it has the bytes locally and wants chunks anyway). A browser downloading a large chunked file is the hard case. Options:
| Option | How | Bytes through our compute? | Verdict |
|---|---|---|---|
| Store small files whole | no reconstruction for the common case | no | ✅ default; eliminates most of the problem |
| Service-worker reassembly | browser fetches chunks via presigned URLs, a service worker concatenates into the download stream | no | ✅ preferred for large chunked files in modern browsers |
| Streaming reconstructor | a thin stateless endpoint streams pack ranges → client, concatenated server-side | yes (bounded, streamed) | ⚠️ fallback only; rate-limited, CDN-fronted |
| Pre-materialized whole object | keep a coalesced whole copy for hot/large files | no (extra storage) | ⚠️ for frequently browser-downloaded large files |
Recommendation / tradeoff: default to store-small-whole + service-worker reassembly; use the streaming reconstructor only as a compatibility fallback (it reintroduces compute egress, so it is bounded and metered). This keeps the dedup storage model without paying reconstruction cost on the common path. (Magic Pocket likewise reconstructs files from blocks; the key is to keep that off the hot, compute-bound path.)
5. Cold-tier reads (rehydration)
A chunk on an archival tier (Glacier/Archive/Coldline) is not immediately readable — recall takes minutes to hours (10).
flowchart TB
classDef c fill:#fde68a,stroke:#b45309,color:#111827;
classDef w fill:#fed7aa,stroke:#c2410c,color:#111827;
req["Download hits a cold chunk"]:::c --> r["Initiate provider restore<br/>(mark chunk rehydrating)"]:::w
r --> p["Poll / await restore callback"]:::w
p --> ready["Chunk temporarily on a hot tier"]:::c
ready --> serve["Issue presigned GET"]:::c
- The API returns 202 + a job, notifying the user (event/webhook, 04 contexts) when ready, rather than blocking.
- Restore copies are temporary; lifecycle returns them to cold after a window.
- Predictive rehydration (warm a version’s chunks when a user opens its folder) is a future optimization, designed-for via the manifest (we know all chunks up front).
6. Caching & CDN (content addressing’s payoff)
Immutable, content-addressed objects are the ideal cache citizens:
- Perfect cache key: the hash is the key; no invalidation logic — content never changes under a key, so TTLs can be effectively infinite.
- CDN in front of object storage for public links and hot reads, with signed CDN URLs; the CDN caches pack/chunk objects by key.
- Edge dedup: because the same chunk underlies many files/versions, a cached hot chunk serves many logical downloads.
- Authz vs cache tension: private content must not be cached unauthenticated → presigned/signed-URL TTLs + per-tenant cache keying; public-link content is freely cacheable. The two are kept distinct at the URL/permission layer.
7. Tradeoffs / Alternatives / Scaling
Tradeoffs. The manifest indirection adds one metadata lookup per download (cached, 08). Reconstruction adds client work for large chunked files — bounded by storing small files whole and coalescing pack ranges.
Alternatives.
- Always reconstruct server-side: simplest client, but the R5 compute-egress cost at scale — rejected except as fallback.
- Never chunk (store every file whole): trivial downloads, but forfeits dedup and delta-sync — rejected; the small-file-whole threshold captures most of the download simplicity without losing dedup on the data that matters (large/edited/shared).
- Client-side decryption for E2E: would move keys to clients and break CDN caching of plaintext — out of scope (ADR-0014).
Scaling concerns.
- Request amplification: a naive chunked download = N provider GETs. Mitigated by pack-range coalescing (§3), CDN edge caching (§6), and storing small files whole.
- Manifest read hotspots for popular files → cache manifests in Redis; manifests are immutable so caching is safe and infinite-TTL.
- Cold-recall stampedes (many users open an archived dataset) → coalesce restore requests per chunk (one restore serves all waiters), rate-limit, and prefer tiering policies that keep likely-read data warm (10).
- Egress cost is the dominant download cost at scale → CDN offload, in-region serving, and provider selection by egress price (Placement, 09).
References
- HTTP Range requests: https://developer.mozilla.org/docs/Web/HTTP/Range_requests
- S3 Glacier restore (rehydration latency): https://docs.aws.amazon.com/AmazonS3/latest/userguide/restoring-objects.html
- Dropbox block reconstruction: https://dropbox.tech/infrastructure/inside-the-magic-pocket