BitVault Sync Engine — Design
Audience: distributed-systems engineers. Scope: the design of BitVault’s Dropbox-style synchronization engine — the native-client component that keeps a local folder and the cloud namespace convergent, correctly, offline-tolerantly, and without ever losing data.
This is the realization of ADR-0008 (change-journal sync + conflicted copies) and builds directly on the storage subsystem (content-addressed chunks, manifests, per-tenant dedup) and the namespace model (nodes, versions, the change journal). It is the headline competency the original review demanded (01 §1.4).
No implementation code — architecture and documentation only. Decisions carry Tradeoffs / Alternatives / Scaling; the contested ones are ADRs 0022–0027.
0. Reading order & questions answered
| # | Doc | Brief topic(s) | Answers |
|---|---|---|---|
| 01 | Prior art: Dropbox & others | — | How does Dropbox sync work? |
| 02 | Client architecture | (detailed architecture) | How should BitVault sync work? |
| 03 | Local database design | local database, metadata caching | |
| 04 | File watching & scanning | local file watching | |
| 05 | Reconciliation & the planner | state reconciliation | |
| 06 | Sync state machine | (sync state machine) | |
| 07 | Sync protocol | sync protocols, multi-device | |
| 08 | Delta sync & large files | delta sync, incremental uploads | How are large files synced? |
| 09 | Conflict resolution | conflict resolution | How are conflicts handled? |
| 10 | Offline-first & sync queue | offline-first, sync queues | |
| 11 | Failure modes & analysis | — | What are common failure modes? |
1. The mental model: three trees and a planner
After four years rewriting its sync engine (codename Nucleus), Dropbox converged on a model we adopt wholesale because it is correct by construction: represent state, not activity, as three trees, and compute sync as a three-way merge between them.
flowchart TB
classDef t fill:#fde68a,stroke:#b45309,color:#111827;
classDef p fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef o fill:#bbf7d0,stroke:#15803d,color:#111827;
R["REMOTE tree (R)<br/>server's current namespace<br/>(learned via cursor delta, ADR-0008)"]:::t
L["LOCAL tree (L)<br/>last observed on-disk state<br/>(watcher + scanner)"]:::t
S["SYNCED tree (S)<br/>last state where R and L agreed<br/>= the merge base"]:::t
P{{"Planner<br/>diff(S→R) = remote changes<br/>diff(S→L) = local changes<br/>3-way merge → operations"}}:::p
R --> P
L --> P
S --> P
P --> apL["apply-local ops<br/>download / delete / rename"]:::o
P --> apR["apply-remote ops<br/>upload / delete / rename"]:::o
P --> cf["conflict ops<br/>conflicted copy (never overwrite)"]:::o
apL -.->|on success advance| S
apR -.->|on success advance| S
- Remote tree (R): the cloud’s current state for the synced subtree, learned via cursor-based delta from the change journal (07, ADR-0008).
- Local tree (L): the last observed on-disk state, from the watcher + scanner (04), persisted in the local DB.
- Synced tree (S): the last state where R and L were equal — the merge base. This is the keystone: it lets the planner unambiguously decide, per node, whether a difference came from the local side, the remote side, or both (a conflict).
The Planner (05) consumes the three trees and emits operations that converge them; as each operation completes, the Synced tree advances. Everything else in this design — watching, the queue, the protocol, conflict handling — exists to feed and execute this loop reliably.
Why this beats representing “pending operations” directly: the legacy approach (a queue of in-flight actions) loses the ability to reason about correctness after crashes, reorderings, and offline edits. Persisting observed state and recomputing the plan is idempotent and re-entrant — you can re-run it from any point and get the right answer. This is the single most important design decision (ADR-0022).
2. Node identity: IDs, not paths (renames are O(1))
Every file/folder is a node with a stable server-assigned ID (08),
not keyed by path. A move/rename is a single node mutation, not a delete+create of the
whole subtree. The local DB maps node_id ↔ path ↔ inode. This makes renames cheap,
move-detection possible, and the three trees comparable by identity rather than by
fragile paths. (Dropbox’s Nucleus made exactly this change; the legacy engine keyed by
path and exploded a rename into O(n) deletes/adds.)
3. Design tenets
- Never lose data. Concurrent divergence produces a conflicted copy, never a silent overwrite; every version is recoverable (09, ADR-0008).
- State, not activity. Persist observed trees; recompute the plan. Idempotent, re-entrant, crash-safe (ADR-0022).
- The watcher is a hint; rescan + content hash is truth. OS watchers drop events under load; correctness never depends on them (04, ADR-0025).
- The server is the authority and the serialization point. A single total order (the journal) means clients need no per-file vector clocks (ADR-0022/0024).
- Atomic local application. Downloads land via temp-file + fsync + atomic rename; a partial file is never visible (08, ADR-0027).
- Offline-first. Every intent is durable in the local DB and replayed on reconnect; arbitrary offline duration is fine (10).
- Move bytes the cheap way. Delta sync over content-addressed chunks; only new chunks cross the wire (08).
- Determinism & testability. Single-threaded control logic; I/O and hashing on workers; the planner is pure (trees in → ops out) so it can be exhaustively tested (Dropbox’s CanopyCheck/Trinity approach).
4. Scope & boundaries
- In scope: the native sync clients — the Go daemon (shared engine library) behind the desktop app and CLI, and (via bindings) the future mobile app. The engine is one reusable Go library.
- Out of scope here: the web app does not sync a local filesystem; it talks to the REST API directly. The server-side Sync service (journal projection, cursor service, notification fan-out) is specified in 07 protocol and 05 service boundaries; it owns the journal, the clients own their local trees.
- Data ownership: the server owns the namespace + journal + version history (authoritative); each client owns its local DB (the three trees + queue + cursor), which is a rebuildable cache — losing it triggers a full re-scan + re-list, not data loss (03).
flowchart LR
classDef c fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef s fill:#fde68a,stroke:#b45309,color:#111827;
subgraph DEV["Device (native client = the engine)"]
fs[("Local filesystem")]:::c
eng["Sync engine<br/>watcher · scanner · planner · scheduler · transfer"]:::c
db[("Local SQLite DB<br/>3 trees + queue + cursor")]:::c
end
subgraph SRV["BitVault server"]
gw["Gateway / Sync API"]:::s
sy["Sync service<br/>journal · cursor · notify"]:::s
st["Storage<br/>chunks · manifests (storage/)"]:::s
end
fs <--> eng <--> db
eng <-->|"cursor pull + commit (gRPC/REST)"| gw
eng -. "presigned chunk PUT/GET" .-> st
gw <--> sy
sy -. "notify: namespace advanced" .-> eng
5. How a change flows (the 10-second tour)
- Local edit → watcher fires (hint) → scanner confirms via hash → Local tree
updates → planner sees
diff(S→L)→ schedules an upload → CDC-chunk, dedup- negotiate, upload new chunks, commit manifest with base version → server journal advances → Synced tree advances. - Remote edit (another device) → server notifies “namespace advanced” → client
pulls cursor delta → Remote tree updates → planner sees
diff(S→R)→ schedules a download → fetch missing chunks, reconstruct atomically → Synced tree advances. - Both edited the same file → planner sees
diff(S→R)anddiff(S→L)for one node → conflict → keep both as a conflicted copy → both propagate as versions.
Related ADRs
| ADR | Decision |
|---|---|
| 0008 | (foundational) change-journal sync + conflicted copies |
| 0022 | Three-tree (local/remote/synced) reconciliation |
| 0023 | Local SQLite sync DB as a rebuildable cache |
| 0024 | Cursor-based delta pull + push notification |
| 0025 | Watcher-as-hint + authoritative rescan + hash truth |
| 0026 | Conflict taxonomy & resolution policy |
| 0027 | Sync safety guards (atomic apply, self-write, bulk-delete brake) |
Inherited: 0006 outbox/NATS, 0011 presigned, 0016 BLAKE3, 0017 chunking, 0018 dedup scope.
References (research grounding)
- Dropbox, “Rewriting the heart of our sync engine” (Nucleus, three trees): https://dropbox.tech/infrastructure/rewriting-the-heart-of-our-sync-engine
- Dropbox, “Testing our new sync engine” (Canopy, planner, CanopyCheck/Trinity): https://dropbox.tech/infrastructure/-testing-our-new-sync-engine
- Dropbox, Detecting Changes (cursor + longpoll + 409 reset): https://developers.dropbox.com/detecting-changes-guide
- Syncthing Block Exchange Protocol (P2P, version vectors, conflict naming): https://docs.syncthing.net/specs/bep-v1.html
- rsync algorithm (rolling checksum delta — the alternative): https://www.samba.org/rsync/tech_report/node2.html
- inotify limits / IN_Q_OVERFLOW: https://man7.org/linux/man-pages/man7/inotify.7.html