ADR-0016 — BLAKE3 for content & chunk addressing (SHA-256 compliance mode)
- Status: Accepted
- Date: 2026-06-11
- Related: storage/02 content-addressing, storage/04 integrity, ADR-0014
Context
Content addressing (ADR-0005, storage/02) names every chunk and object by the hash of its content. That hash is on the ingest path for every byte stored, is the integrity authority, and underpins dedup correctness — so its collision resistance is load-bearing (a collision = one chunk’s bytes served for another = corruption + a security hole) and its throughput is a real cost at PB scale.
Decision
Address chunks and objects with BLAKE3 (256-bit). Offer SHA-256 as a
per-deployment compliance mode (FIPS/regulatory). Record hash_algo per chunk so
both can coexist during any future migration. Never use MD5/SHA-1 for addressing
(collision-broken). Non-cryptographic hashes (xxHash) are permitted only as cheap
pre-filters, never as addresses.
Consequences
Positive
- 4–10× faster than SHA-256 single-threaded and linearly parallel (Merkle tree) → cheaper ingest, parallel large-file hashing.
- Verified streaming: BLAKE3’s internal Merkle tree verifies arbitrary byte ranges without the whole chunk → efficient range reads (storage/06) and sampled scrubbing (storage/04).
- Collision-resistant → dedup is correctness-safe.
Negative / costs
- Shorter public-cryptanalysis history (~5 yrs vs SHA-256’s ~23). Mitigation: SHA-256 compliance mode; algorithm-id per row enables migration if BLAKE3 were ever broken.
- Not FIPS-validated → regulated tenants use the SHA-256 mode (losing verified streaming there).
Alternatives considered
- SHA-256 everywhere: maximum maturity/FIPS, but ~2–3× slower, sequential, no verified streaming. Kept as the opt-in mode, not the default.
- SHA-512/256, SHA-3: no compelling speed or feature edge over the BLAKE3/SHA-256 pair for our needs.
- MD5/SHA-1: rejected — broken collision resistance makes them a corruption vector in a dedup system.
Scaling
Hashing throughput scales with cores via BLAKE3’s tree; the per-row hash_algo field
keeps a future algorithm migration a lazy/background reindex rather than a flag day.