06 — Downloads & Reconstruction

Topic: download flows. Downloads are where the chunked/deduped/packed storage model meets the user’s expectation of “give me my file, fast.” The central tension: storage is optimized for dedup (many small chunks, possibly packed), but download wants few, large, sequential reads. This doc resolves it.


1. Resolution pipeline (version → bytes)

sequenceDiagram
    autonumber
    participant C as Client
    participant GW as Gateway
    participant SH as Sharing/Authz
    participant S as Storage Coordinator
    participant DB as Manifest + Chunk/Pack Index
    participant O as Object Store
    C->>GW: GET /v1/files/{id}/content (Range optional)
    GW->>SH: CheckAccess(principal, node, read)
    SH-->>GW: allow
    GW->>S: ResolveDownload(version, range?)
    S->>DB: content_hash → manifest → chunk refs → locations + tiers
    DB-->>S: [{chunk, pack_id|object, offset, len, tier}]
    alt any chunk on cold tier
        S-->>C: 202 Accepted (rehydrating) — notify when ready (§5)
    else all online
        S-->>C: plan = presigned GET(s) + ranges (or single URL)
        C->>O: GET bytes (direct, presigned, Range)
        O-->>C: bytes
        C->>C: BLAKE3-verify each chunk/range ([04])
    end

Authz is resolved before any URL is issued; presigned URLs are scoped + short- TTL (ADR-0011). Bytes flow client ⇄ provider, not through our compute.


2. Three download shapes (chosen by how the file was stored + who’s asking)

File / client How served Reads
Small / whole-stored (≤ chunk threshold, 05) single presigned GET 1
Large, chunked → smart client (CLI/sync/mobile) client fetches only missing chunks, reconstructs locally per manifest N (deduped)
Large, chunked → browser / simple range reads over packs, reassembled (see §4) N or streamed

The small-file fast path is why we don’t chunk small files (05 §7): the majority of downloads become a single direct GET with zero reconstruction.


3. Reading packed chunks (range reads)

When a chunk lives inside a ~1 GiB pack (02, 11), we don’t download the pack — we issue a presigned GET on the pack object and the client sends a Range: bytes=offset-(offset+len-1) header for exactly that chunk. The Pack Index supplies (pack_id, offset, len).


4. The reconstruction-location decision (browser problem)

Reassembling many chunks is easy for a smart client (it has the bytes locally and wants chunks anyway). A browser downloading a large chunked file is the hard case. Options:

Option How Bytes through our compute? Verdict
Store small files whole no reconstruction for the common case no ✅ default; eliminates most of the problem
Service-worker reassembly browser fetches chunks via presigned URLs, a service worker concatenates into the download stream no ✅ preferred for large chunked files in modern browsers
Streaming reconstructor a thin stateless endpoint streams pack ranges → client, concatenated server-side yes (bounded, streamed) ⚠️ fallback only; rate-limited, CDN-fronted
Pre-materialized whole object keep a coalesced whole copy for hot/large files no (extra storage) ⚠️ for frequently browser-downloaded large files

Recommendation / tradeoff: default to store-small-whole + service-worker reassembly; use the streaming reconstructor only as a compatibility fallback (it reintroduces compute egress, so it is bounded and metered). This keeps the dedup storage model without paying reconstruction cost on the common path. (Magic Pocket likewise reconstructs files from blocks; the key is to keep that off the hot, compute-bound path.)


5. Cold-tier reads (rehydration)

A chunk on an archival tier (Glacier/Archive/Coldline) is not immediately readable — recall takes minutes to hours (10).

flowchart TB
    classDef c fill:#fde68a,stroke:#b45309,color:#111827;
    classDef w fill:#fed7aa,stroke:#c2410c,color:#111827;
    req["Download hits a cold chunk"]:::c --> r["Initiate provider restore<br/>(mark chunk rehydrating)"]:::w
    r --> p["Poll / await restore callback"]:::w
    p --> ready["Chunk temporarily on a hot tier"]:::c
    ready --> serve["Issue presigned GET"]:::c

6. Caching & CDN (content addressing’s payoff)

Immutable, content-addressed objects are the ideal cache citizens:


7. Tradeoffs / Alternatives / Scaling

Tradeoffs. The manifest indirection adds one metadata lookup per download (cached, 08). Reconstruction adds client work for large chunked files — bounded by storing small files whole and coalescing pack ranges.

Alternatives.

Scaling concerns.

References