ADR-0019 — Safe whole-object GC: grace period + atomic re-confirmation
- Status: Accepted
- Date: 2026-06-11 · Revised: 2026-06-12 (Architecture Freeze V1)
- Related: ADR-0017, ADR-0018, 08 §2 I2, 06 §4
V1 Freeze (2026-06-12): Accepted, whole-object. Blocker-2 resolution: deletion is authorized by grace + atomic zero-reference re-confirmation (CAS), never by a bare refcount — closing the dedup-vs-delete race. The chunk/pack state-machine and pack-compaction described pre-freeze are deferred with ADR-0017; V1 runs this simpler whole-object GC.
Context
With dedup, deletion is not the inverse of writing: a blob’s bytes may be deleted only when no version references them, and a new reference (a dedup hit on re-upload of identical bytes) can appear at the instant GC decides to delete. Pure refcounting hits this race and drifts under crashes/retries. The asymmetry is brutal: deleting live data is unrecoverable; leaking space is merely costly.
The pre-freeze review (review §3.2) found that data-model invariant I2 specified the
exact naive rule — “GC deletes the object when refcount = 0” — that this ADR’s
own prior text called unsafe. The freeze reconciles them: I2 now points here, and
this ADR is scoped to the whole-object model V1 actually builds.
Decision
Authorize deletion by state + grace + atomic re-confirmation, not by refcount alone:
- Refcount is a hint, never the authority. It identifies candidates for collection; it never directly triggers a delete.
- Per-blob state machine
staging → committed → orphaned → deleting → deleted, guarded by compare-and-swap:- A blob enters
orphanedwhen its refcount reaches 0 (or astagingblob exceeds its upload TTL without a commit). orphaned → deletingrequires the grace period elapsed and refcount re-confirmed 0 atomically in the same transaction that flips the state. A dedup hit during the grace window flips the blob back tocommitted(the new reference wins).- The commit protocol treats a blob in
orphaned/deletingas absent → the client re-uploads. A blob is only physically removed from the object store after it is durablydeletingwith zero references.
- A blob enters
- Incremental candidate sweep for day-to-day reclamation (touches only recently-orphaned blobs) plus a low-frequency, per-tenant mark-sweep backstop that reconciles any refcount drift from crashes. Both are idempotent and resumable via the state machine + tombstones.
No chunk packing/compaction in V1 (one blob = one object); that machinery returns only if ADR-0017 (chunking) is un-deferred.
Consequences
Positive
- The dedup-vs-GC race cannot delete live data: the grace window + atomic re-confirm + flip-back guarantee a referenced blob is never collected.
- Refcount drift from crashes is reconciled by the periodic backstop — refcount can be wrong without being dangerous.
- Online — never locks the store; uploads/downloads proceed during GC.
- Simple enough for a solo developer to implement and test correctly (a single state machine on one table), unlike the chunk/pack/compactor design.
Negative / costs
- The grace period leaves dead bytes around for hours (tunable) — accepted given the asymmetric risk.
- A blob orphaned then re-referenced within grace incurs a state flip-back rather than a no-op — negligible.
- The backstop is a periodic per-tenant scan — cheap at V1 scale; revisit only when a tenant’s blob count makes full scans expensive (a forcing function for ADR-0017’s candidate-based chunk GC).
Alternatives considered
- Immediate delete at zero refs (no grace): exposes the race directly → unrecoverable data loss. Rejected (this was the I2 bug).
- Pure periodic full mark-sweep only: correct but expensive; kept only as the backstop, not the primary path.
- Chunk/pack state machine + compaction (pre-freeze): correct but heavy; unjustified without chunking. Deferred with ADR-0017.
Verification
Dual-write / GC chaos test (P0 acceptance, review §1 done-criteria): kill the
process between the object PUT and the commit → assert no committed version
references the orphan and the orphan is reclaimed after grace; and a re-upload of
identical bytes during the grace window → assert the blob flips back to
committed and is not deleted. (I1, I2.)