04 — Integrity & Checksums

Topics: checksums, data integrity verification. The promise behind a storage product: the bytes you get back are exactly the bytes you put in — forever, across five providers, despite bitrot, truncation, and provider bugs.

Integrity in BitVault is defense in depth: the content hash is the spine, and checksums are verified at every hop so a corruption is caught at the earliest possible point and attributed to the right layer.

1. The integrity stack (verify at every hop)

flowchart LR
    classDef step fill:#fde68a,stroke:#b45309,color:#111827;
    c["Client<br/>computes chunk_hash"]:::step --> t["In transit<br/>TLS + provider PUT checksum"]:::step
    t --> commit["On commit<br/>server Head + verify size/checksum"]:::step
    commit --> rest["At rest<br/>name == BLAKE3(content)"]:::step
    rest --> scrub["Background scrub<br/>periodic read-verify"]:::step
    scrub --> dl["On download<br/>client re-verifies hash"]:::step

Hop	Mechanism	Catches
In transit (up)	TLS + provider checksum on PUT (CRC32C/SHA-256 where available)	network corruption, truncated upload
On commit	server `Head` (size) + checksum match before manifest write (SI-1)	missing/short/garbled chunk before it becomes referencable
At rest	the object’s name is its BLAKE3 hash	any later mutation/bitrot is detectable by recompute
Background	scrubber read-verifies on a schedule	silent bitrot, provider-side loss/corruption, missing objects
On download	client recomputes hash over received bytes	corruption anywhere downstream incl. cache/CDN

End-to-end property: the client both produces and verifies the hash, so no intermediate layer (our compute, the provider, a CDN) is trusted to be correct — it is checked.

2. Checksums: ours + the provider’s (belt and suspenders)

Ours (authoritative): BLAKE3 over chunk content = the chunk’s name (02). This is the integrity authority; it is provider-independent and survives migration between providers.
Provider’s (corroborating): we also send/verify the provider’s native checksum on PUT (S3 CRC32C/SHA-256, GCS CRC32C, Azure CRC64) so the provider rejects a corrupted transfer at ingest and so we can detect provider-side corruption without downloading (compare stored provider checksum to expected).

Why both: our hash proves content; the provider checksum lets the provider reject bad transfers and lets us cheaply audit via Head/metadata without egress.

Verified streaming (BLAKE3 superpower)

Because BLAKE3 is a Merkle tree internally, we can verify a byte range without the whole chunk. This makes range downloads (06) and partial scrubbing verifiable, and lets the scrubber sample-verify large packs efficiently. SHA-256 (compliance mode) loses this; there we verify whole chunks.

3. The scrubber (background data integrity verification)

A continuous worker that reads stored bytes and confirms they still hash correctly — the only defense against silent corruption (bitrot, firmware bugs, provider incidents) that no request-path check would ever notice.

Schedule: every chunk/pack verified at least every N days (policy; e.g. 30). Prioritized by age, tier (cold media bitrots more), provider incident signals, and last-verified watermark stored in the chunk index.
Method: stream pack/chunk bytes, recompute BLAKE3, compare to the name; use provider checksums + BLAKE3 partial verification to sample huge packs cheaply, escalating to full verification on any mismatch.
Cost control: scrubbing reads bytes = egress/IO cost. Mitigations: run co-located with storage (in-region/in-cluster compute, minimal egress), verify via provider-stored checksums where trustworthy, sample then full-verify, and throttle to a configured IO budget. (Scrub bandwidth vs detection latency is the core tradeoff — more frequent = faster detection, higher cost.)

4. Durability & repair

BitVault leans on provider durability (S3-class ≈ 11 nines) as the baseline and adds detection + optional cross-provider redundancy on top.

Redundancy options (policy-driven, 09)

We do not reimplement intra-datacenter erasure coding — providers already do it far better than we could. Our value-add is detection (scrub) + policy- driven cross-provider redundancy for data that warrants it, plus the content hash that makes any copy verifiable.

Repair flow

flowchart TB
    classDef bad fill:#fecaca,stroke:#b91c1c,color:#111827;
    classDef ok fill:#bbf7d0,stroke:#15803d,color:#111827;
    d["Scrub/Download detects<br/>hash mismatch or missing"]:::bad --> q["Quarantine chunk<br/>mark suspect in index"]:::bad
    q --> f{"another good copy?<br/>(replica / other provider / other tier)"}
    f -- yes --> r["Re-copy good bytes →<br/>rewrite object, clear suspect"]:::ok
    f -- no --> l["Mark chunk LOST<br/>flag affected manifests/versions degraded<br/>alert + audit"]:::bad

A mismatch never silently returns bad bytes: the chunk is quarantined, repair is attempted from any verified copy, and if none exists the affected versions are flagged degraded (not silently broken) and surfaced to admins. Because content is addressed by hash, any located copy is provably the correct bytes.

5. Tradeoffs / Alternatives / Scaling

Tradeoffs. Verifying at every hop costs CPU (hashing) and IO (scrub egress). We accept ingest hashing (cheap with BLAKE3) and budget/throttle scrubbing. Skipping scrub would save cost but reintroduce silent-bitrot risk — unacceptable for a storage product.

Alternatives.

Trust provider durability alone, no scrub: cheaper, but provides no detection of cross-provider/tiering/our-own-bug corruption and no early warning. Rejected as default; scrub frequency is tunable down to near-zero for cost-sensitive self-host.
Whole-system Merkle tree (every object under one root, à la Certificate Transparency): enables global tamper-evidence but is heavy; per-chunk BLAKE3 + per-object manifest root already gives verifiable integrity without a global log.

Scaling.

Scrub throughput must keep pace with stored volume to hold the N-day SLO → scrubbing is horizontally partitioned by key-prefix/pack and runs as an extractable worker pool (README §4).
last_verified_at lives in the chunk index; the scrubber walks an index range, not a provider List, so it scales with the (sharded) index, not with bucket listing latency.
Repair load is bounded by setting cross-provider redundancy only on data whose durability class warrants it (policy), keeping the common case at 1× cost.

References

BLAKE3 verified streaming / Bao: https://github.com/oconnor663/bao
S3 additional checksums (CRC32C/SHA-256): https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html
S3 durability model: https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html