05 — State Reconciliation & the Planner
Topic: state reconciliation. The planner is the brain of the engine: a pure function that takes the three trees and emits the operations to converge them. Decision in ADR-0022.
1. The reconciliation principle
For every node, compare its three snapshots — Synced (S), Remote (R), Local (L) — by identity (node ID) and content (hash + presence). The Synced snapshot is the merge base: it tells us which side moved.
R == S→ remote didn’t change since last sync.L == S→ local didn’t change since last sync.- The side(s) that differ from S are the side(s) that changed. If both differ and they differ from each other, it’s a conflict.
This is a per-node three-way merge — the same logic as a VCS merge, applied to a filesystem tree.
2. The decision matrix (the heart of the planner)
P = present, ∅ = absent; hashes compared when both present.
| S | R | L | Condition | Meaning | Operation |
|---|---|---|---|---|---|
| P | =S | =S | nothing differs | already synced | none |
| P | =S | ≠S | local changed only | local edit | Upload (L→R) |
| P | ≠S | =S | remote changed only | remote edit | Download (R→L) |
| P | ≠S | ≠S | both changed, R==L | converged independently | advance S only (no transfer) |
| P | ≠S | ≠S | both changed, R≠L | divergent edits | Conflict → conflicted copy (09) |
| ∅ | P | ∅ | new remote node | remote create | Download (+ mkdir parents) |
| ∅ | ∅ | P | new local node | local create | Upload (+ remote mkdir parents) |
| ∅ | P | P | both created same name | create/create | hash-equal → adopt; else Conflict |
| P | ∅ | =S | remote deleted, local unchanged | remote delete | LocalDelete |
| P | =S | ∅ | local deleted, remote unchanged | local delete | RemoteDelete |
| P | ∅ | ≠S | remote deleted, local edited | delete vs edit | edit wins → resurrect as upload + surface (09) |
| P | ≠S | ∅ | remote edited, local deleted | edit vs delete | edit wins → download + surface |
| P | ∅ | ∅ | both deleted | converged delete | advance S (drop node) |
Two safety rules embedded above: “R==L ⇒ no transfer, just record” (a convergent change costs nothing), and “edit always beats delete” (never let a delete destroy an edit; ADR-0026). Renames are detected upstream (04 §5) and represented as a single node-ID move, not delete+create, so they don’t hit the create/delete rows.
3. Operations & the dependency DAG
The planner emits typed operations; the scheduler (10) runs them respecting dependencies:
| Op | Notes | Depends on |
|---|---|---|
Mkdir (local/remote) |
create parent before child | — |
Download |
fetch chunks + atomic apply (08) | parent Mkdir |
Upload |
negotiate + presigned PUT + commit | parent remote Mkdir |
Rename/Move |
single node-ID move | target parent exists |
LocalDelete / RemoteDelete |
bottom-up | children deleted first |
CreateConflictedCopy |
materialize both sides (09) | — |
flowchart TB
classDef a fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef o fill:#bbf7d0,stroke:#15803d,color:#111827;
obs["Observation drained<br/>(Local and/or Remote tree changed)"]:::a --> dirty["Dirty set = nodes where S≠R or S≠L"]:::a
dirty --> plan["Planner: per-node decision matrix"]:::a
plan --> ops["Operation set + dependency DAG"]:::a
ops --> topo["Topological order<br/>(mkdir → content; deletes bottom-up)"]:::a
topo --> exec["Scheduler executes (parallel where independent)"]:::a
exec -->|each op succeeds| adv["Advance Synced tree for that node"]:::o
adv -->|trees converged?| done{"converged?"}:::a
done -- no --> plan
done -- yes --> idle["Idle / watch"]:::o
Ordering invariants: parents before children for creates; children before parents for deletes; renames before dependent content ops. Independent subtrees run in parallel.
4. Re-entrancy & convergence (why this is robust)
- The planner is pure: same
(R,L,S)→ same ops. No hidden state, no I/O. It can be property-tested by generating arbitrary tree triples and asserting convergence (Dropbox’s CanopyCheck; “drive any three trees to equality”). - It runs in a loop: execute ops → advance S → re-plan → until
R == L == S. Partial progress is always valid, so crashes/offline/interruptions just resume the loop. - Convergence guarantee: every executed op strictly reduces the divergence between S and (R,L) for at least one node and never increases it elsewhere → the loop terminates when no node disagrees. Conflicts terminate by materializing both sides as concrete (now-agreeing) nodes.
5. Incremental planning (scale)
Planning the whole tree on every tick is O(tree). Instead, observations mark a dirty set (nodes whose Local or Remote snapshot just changed); the planner only visits dirty nodes + their structural dependencies. A full plan runs only after a full rescan or cursor reset. At millions of files this is the difference between “instant” and “unusable.”
6. Tradeoffs / Alternatives / Scaling
Tradeoffs. The three-tree model stores ~3× the per-node metadata of an “ops queue” design, and requires maintaining the Synced tree on every completion. That cost buys crash-safe, re-entrant, testable reconciliation — a trade every serious sync engine (post-Nucleus) makes.
Alternatives considered.
- Operation-log replay (apply a journal of local/remote ops): fragile across crashes, reordering, and offline divergence; no clean merge base. Rejected (ADR-0022).
- Two-way diff (local vs remote, no Synced base): cannot distinguish “remote added X” from “local deleted X” — the ambiguity that causes data loss. The Synced base resolves it. This is precisely why Dropbox introduced the Synced tree.
- Vector clocks per file: needed for P2P (Syncthing) but redundant given a central total order; adds per-file causal metadata for no benefit here.
Scaling concerns.
- Dirty-set planning keeps work proportional to changes, not tree size.
- Subtree parallelism: independent subtrees plan/execute concurrently (the executor parallelizes; the planner stays single-threaded and fast since it’s metadata-only).
- Big folder moves: O(1) via node-ID move, not O(descendants).
- Conflict-heavy workloads: bounded by materializing conflicts immediately (they become normal nodes), so the loop can’t livelock on a perpetually-divergent node.