ADR-0016 — BLAKE3 for content & chunk addressing (SHA-256 compliance mode)

Context

Content addressing (ADR-0005, storage/02) names every chunk and object by the hash of its content. That hash is on the ingest path for every byte stored, is the integrity authority, and underpins dedup correctness — so its collision resistance is load-bearing (a collision = one chunk’s bytes served for another = corruption + a security hole) and its throughput is a real cost at PB scale.

Decision

Address chunks and objects with BLAKE3 (256-bit). Offer SHA-256 as a per-deployment compliance mode (FIPS/regulatory). Record hash_algo per chunk so both can coexist during any future migration. Never use MD5/SHA-1 for addressing (collision-broken). Non-cryptographic hashes (xxHash) are permitted only as cheap pre-filters, never as addresses.

Consequences

Positive

Negative / costs

Alternatives considered

Scaling

Hashing throughput scales with cores via BLAKE3’s tree; the per-row hash_algo field keeps a future algorithm migration a lazy/background reindex rather than a flag day.