04 — Integrity & Checksums
Topics: checksums, data integrity verification. The promise behind a storage product: the bytes you get back are exactly the bytes you put in — forever, across five providers, despite bitrot, truncation, and provider bugs.
Integrity in BitVault is defense in depth: the content hash is the spine, and checksums are verified at every hop so a corruption is caught at the earliest possible point and attributed to the right layer.
1. The integrity stack (verify at every hop)
flowchart LR
classDef step fill:#fde68a,stroke:#b45309,color:#111827;
c["Client<br/>computes chunk_hash"]:::step --> t["In transit<br/>TLS + provider PUT checksum"]:::step
t --> commit["On commit<br/>server Head + verify size/checksum"]:::step
commit --> rest["At rest<br/>name == BLAKE3(content)"]:::step
rest --> scrub["Background scrub<br/>periodic read-verify"]:::step
scrub --> dl["On download<br/>client re-verifies hash"]:::step
| Hop | Mechanism | Catches |
|---|---|---|
| In transit (up) | TLS + provider checksum on PUT (CRC32C/SHA-256 where available) | network corruption, truncated upload |
| On commit | server Head (size) + checksum match before manifest write (SI-1) |
missing/short/garbled chunk before it becomes referencable |
| At rest | the object’s name is its BLAKE3 hash | any later mutation/bitrot is detectable by recompute |
| Background | scrubber read-verifies on a schedule | silent bitrot, provider-side loss/corruption, missing objects |
| On download | client recomputes hash over received bytes | corruption anywhere downstream incl. cache/CDN |
End-to-end property: the client both produces and verifies the hash, so no intermediate layer (our compute, the provider, a CDN) is trusted to be correct — it is checked.
2. Checksums: ours + the provider’s (belt and suspenders)
- Ours (authoritative): BLAKE3 over chunk content = the chunk’s name (02). This is the integrity authority; it is provider-independent and survives migration between providers.
- Provider’s (corroborating): we also send/verify the provider’s native checksum on PUT (S3 CRC32C/SHA-256, GCS CRC32C, Azure CRC64) so the provider rejects a corrupted transfer at ingest and so we can detect provider-side corruption without downloading (compare stored provider checksum to expected).
Why both: our hash proves content; the provider checksum lets the provider
reject bad transfers and lets us cheaply audit via Head/metadata without egress.
Verified streaming (BLAKE3 superpower)
Because BLAKE3 is a Merkle tree internally, we can verify a byte range without the whole chunk. This makes range downloads (06) and partial scrubbing verifiable, and lets the scrubber sample-verify large packs efficiently. SHA-256 (compliance mode) loses this; there we verify whole chunks.
3. The scrubber (background data integrity verification)
A continuous worker that reads stored bytes and confirms they still hash correctly — the only defense against silent corruption (bitrot, firmware bugs, provider incidents) that no request-path check would ever notice.
- Schedule: every chunk/pack verified at least every N days (policy; e.g. 30). Prioritized by age, tier (cold media bitrots more), provider incident signals, and last-verified watermark stored in the chunk index.
- Method: stream pack/chunk bytes, recompute BLAKE3, compare to the name; use provider checksums + BLAKE3 partial verification to sample huge packs cheaply, escalating to full verification on any mismatch.
- Cost control: scrubbing reads bytes = egress/IO cost. Mitigations: run co-located with storage (in-region/in-cluster compute, minimal egress), verify via provider-stored checksums where trustworthy, sample then full-verify, and throttle to a configured IO budget. (Scrub bandwidth vs detection latency is the core tradeoff — more frequent = faster detection, higher cost.)
4. Durability & repair
BitVault leans on provider durability (S3-class ≈ 11 nines) as the baseline and adds detection + optional cross-provider redundancy on top.
Redundancy options (policy-driven, 09)
| Mode | How | Durability | Cost | |—|—|—|—| | Single provider (default) | rely on provider’s internal erasure coding | provider’s (≈11 nines) | 1× | | Cross-provider replication | important data mirrored to a 2nd provider/region | survives a whole-provider loss | ~2× + egress | | Erasure across providers | (advanced) split with parity across providers | high, lower overhead than 2× | complex; deferred |
We do not reimplement intra-datacenter erasure coding — providers already do it far better than we could. Our value-add is detection (scrub) + policy- driven cross-provider redundancy for data that warrants it, plus the content hash that makes any copy verifiable.
Repair flow
flowchart TB
classDef bad fill:#fecaca,stroke:#b91c1c,color:#111827;
classDef ok fill:#bbf7d0,stroke:#15803d,color:#111827;
d["Scrub/Download detects<br/>hash mismatch or missing"]:::bad --> q["Quarantine chunk<br/>mark suspect in index"]:::bad
q --> f{"another good copy?<br/>(replica / other provider / other tier)"}
f -- yes --> r["Re-copy good bytes →<br/>rewrite object, clear suspect"]:::ok
f -- no --> l["Mark chunk LOST<br/>flag affected manifests/versions degraded<br/>alert + audit"]:::bad
- A mismatch never silently returns bad bytes: the chunk is quarantined, repair is attempted from any verified copy, and if none exists the affected versions are flagged degraded (not silently broken) and surfaced to admins. Because content is addressed by hash, any located copy is provably the correct bytes.
5. Tradeoffs / Alternatives / Scaling
Tradeoffs. Verifying at every hop costs CPU (hashing) and IO (scrub egress). We accept ingest hashing (cheap with BLAKE3) and budget/throttle scrubbing. Skipping scrub would save cost but reintroduce silent-bitrot risk — unacceptable for a storage product.
Alternatives.
- Trust provider durability alone, no scrub: cheaper, but provides no detection of cross-provider/tiering/our-own-bug corruption and no early warning. Rejected as default; scrub frequency is tunable down to near-zero for cost-sensitive self-host.
- Whole-system Merkle tree (every object under one root, à la Certificate Transparency): enables global tamper-evidence but is heavy; per-chunk BLAKE3 + per-object manifest root already gives verifiable integrity without a global log.
Scaling.
- Scrub throughput must keep pace with stored volume to hold the N-day SLO → scrubbing is horizontally partitioned by key-prefix/pack and runs as an extractable worker pool (README §4).
last_verified_atlives in the chunk index; the scrubber walks an index range, not a providerList, so it scales with the (sharded) index, not with bucket listing latency.- Repair load is bounded by setting cross-provider redundancy only on data whose durability class warrants it (policy), keeping the common case at 1× cost.
References
- BLAKE3 verified streaming / Bao: https://github.com/oconnor663/bao
- S3 additional checksums (CRC32C/SHA-256): https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
- S3 durability model: https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html