ADR-0017 — Content-defined chunking (FastCDC) + packing

V1 Freeze (2026-06-12): Deferred. V1 uses whole-object blobs (ADR-0018); no content-defined chunking or packing. Re-opens when large-file delta/dedup efficiency is a demonstrated need (post-V1).

Context

To dedup and delta-sync, files are split into chunks named by content hash. The split strategy determines dedup quality, and the chunk size determines metadata index cardinality — which is the binding constraint at millions of files / billions of chunks (storage/08). Separately, storing each ~1 MiB chunk as its own provider object is untenable at scale (per-object overhead, request/list cost).

Decision

  1. Content-defined chunking via FastCDC (gear rolling hash, normalized chunking level 2), parameters min 256 KiB / avg ~1 MiB / max 4 MiB.
  2. Do not chunk small files (≤ ~1–4 MiB): store whole (one chunk = one object) — trivial downloads, no index bloat.
  3. Pack committed chunks into ~256 MiB–1 GiB pack objects, with a Pack Index mapping chunk_hash → (pack_id, offset, len); hot/large chunks may stay standalone. Packing is async, co-located maintenance — the user-facing transfer stays direct/ presigned (ADR-0011, storage/11 §6).

Consequences

Positive

Negative / costs

Alternatives considered

Scaling

Cardinality is governed by chunk size (coarse CDC) and object count by packing; both are the levers that let metadata (storage/08) and request cost (storage/01) scale.