ADR-0019 — Safe whole-object GC: grace period + atomic re-confirmation

V1 Freeze (2026-06-12): Accepted, whole-object. Blocker-2 resolution: deletion is authorized by grace + atomic zero-reference re-confirmation (CAS), never by a bare refcount — closing the dedup-vs-delete race. The chunk/pack state-machine and pack-compaction described pre-freeze are deferred with ADR-0017; V1 runs this simpler whole-object GC.

Context

With dedup, deletion is not the inverse of writing: a blob’s bytes may be deleted only when no version references them, and a new reference (a dedup hit on re-upload of identical bytes) can appear at the instant GC decides to delete. Pure refcounting hits this race and drifts under crashes/retries. The asymmetry is brutal: deleting live data is unrecoverable; leaking space is merely costly.

The pre-freeze review (review §3.2) found that data-model invariant I2 specified the exact naive rule — “GC deletes the object when refcount = 0” — that this ADR’s own prior text called unsafe. The freeze reconciles them: I2 now points here, and this ADR is scoped to the whole-object model V1 actually builds.

Decision

Authorize deletion by state + grace + atomic re-confirmation, not by refcount alone:

  1. Refcount is a hint, never the authority. It identifies candidates for collection; it never directly triggers a delete.
  2. Per-blob state machine staging → committed → orphaned → deleting → deleted, guarded by compare-and-swap:
    • A blob enters orphaned when its refcount reaches 0 (or a staging blob exceeds its upload TTL without a commit).
    • orphaned → deleting requires the grace period elapsed and refcount re-confirmed 0 atomically in the same transaction that flips the state. A dedup hit during the grace window flips the blob back to committed (the new reference wins).
    • The commit protocol treats a blob in orphaned/deleting as absent → the client re-uploads. A blob is only physically removed from the object store after it is durably deleting with zero references.
  3. Incremental candidate sweep for day-to-day reclamation (touches only recently-orphaned blobs) plus a low-frequency, per-tenant mark-sweep backstop that reconciles any refcount drift from crashes. Both are idempotent and resumable via the state machine + tombstones.

No chunk packing/compaction in V1 (one blob = one object); that machinery returns only if ADR-0017 (chunking) is un-deferred.

Consequences

Positive

Negative / costs

Alternatives considered

Verification

Dual-write / GC chaos test (P0 acceptance, review §1 done-criteria): kill the process between the object PUT and the commit → assert no committed version references the orphan and the orphan is reclaimed after grace; and a re-upload of identical bytes during the grace window → assert the blob flips back to committed and is not deleted. (I1, I2.)