01 — Prior Art: How Dropbox (and Others) Sync
Answers: How does Dropbox sync work? Plus the lessons we take from Syncthing (P2P) and rsync (delta), and what BitVault adopts vs rejects. Designing a sync engine without studying these is malpractice; the failure modes are subtle and hard-won.
1. Dropbox: the legacy engine and why it was rewritten
Dropbox’s original “Sync Engine Classic” represented sync as outstanding activity — a set of in-flight operations (uploads, downloads) keyed by path. Two structural flaws drove a four-year, ground-up rewrite (codename Nucleus, shipped ~2020, written in Rust):
- Path-keyed nodes: a rename of a folder became O(n) deletes + adds for every descendant — slow, and a correctness minefield under concurrency.
- Activity, not state: representing what’s in flight makes it nearly impossible to reason about correctness after crashes, reorderings, and offline edits. There was no canonical “what should be true” to recompute from.
2. Nucleus: three trees + a planner (the model we adopt)
Nucleus persists state, as three trees, each an internally-consistent filesystem snapshot:
| Tree | Meaning |
|---|---|
| Remote | latest state of the user’s Dropbox in the cloud |
| Local | last observed state on disk |
| Synced | last state known to be fully synced between Remote and Local (the merge base) |
“The Synced Tree is the key innovation that lets the system unambiguously derive the correct sync result.” By comparing Local and Remote each against Synced, you know whether a node changed locally, remotely, or both.
The Planner is the core algorithm: it takes the three trees (collectively the “Canopy”) and outputs a series of operations that incrementally converge them. After each operation, the Synced tree advances toward the others. Other notable choices:
- Nodes keyed by stable ID, not path → renames are O(1) (move the node).
- Single-threaded control logic, with I/O and hashing offloaded to background threads → deterministic, testable.
- Testing as a first-class concern: CanopyCheck drives three arbitrary trees to convergence to verify the planner; Trinity stress-tests concurrent edits to the same file. The planner being a pure function (trees → ops) is what makes this exhaustive testing possible.
3. Dropbox: how clients learn about remote changes (the protocol)
Dropbox’s change-detection protocol (public API mirrors the internal design):
- Cursor: an opaque pointer into the change stream for a path/namespace.
list_folderreturns entries + a cursor;get_latest_cursorreturns “changes from now on.” - Longpoll: the client calls
list_folder/longpollwith its cursor; the call blocks until a change (or timeout) — plus up to 90 s of random jitter to avoid the thundering herd when many clients reconnect at once. - Continue: on signal,
list_folder/continuereturns the actual delta (add/modify/ delete) since the cursor, advancing it. - Cursor expiry / reset: an old/invalid cursor returns 409 reset → the client must re-list from scratch and rebuild its cursor.
- Webhooks for server-side integrations (vs longpoll for clients).
We adopt this almost verbatim (07, ADR-0024): cursor over our change journal (ADR-0008), a lightweight notification to trigger pulls, jitter against thundering herds, and a 409-style cursor-reset recovery path.
4. Dropbox: how file content moves (delta)
Dropbox splits files into ~4 MiB content blocks, hashes each, and uploads only blocks whose hashes are new (“Streaming File Synchronization”), with block-level dedup across a user’s files. We do the same with content-defined chunking (storage/02) — CDC is more robust to inserts than fixed blocks — and per-tenant dedup (storage/03).
5. Syncthing: the P2P contrast (and why we’re not P2P)
Syncthing synchronizes a cluster of peers with no central authority, via the Block Exchange Protocol (BEP):
- Each device advertises a local model (file metadata + block hashes, blocks 128 KiB–16 MiB). The global model = union of all peers’ files, selecting the highest “version” per file.
- Versioning uses version vectors (per-device counters) to establish causal order and detect concurrent edits — necessary because there is no single serialization point.
- Conflicts: simultaneous differing edits → the loser is renamed
name.sync-conflict-<date>-<time>-<deviceID>.ext; older mtime loses, tie broken by device ID. Modify-vs-delete: if delete would win, the file is still preserved as a conflict copy (data is never destroyed). - Scanning: periodic (hourly, or every minute without a watcher) — i.e. even Syncthing treats the watcher as an optimization over authoritative scanning.
What we take: the conflicted-copy naming convention, the edit-beats-delete safety, and “scan is authoritative.” What we reject: version vectors and a global-model merge — BitVault has a central authority (the server journal), which gives a total order for free, so we use the simpler, stronger three-tree + cursor model and avoid per-file vector clocks. (Version vectors would be mandatory if BitVault were peer-to-peer; it is not.)
6. rsync: the delta alternative we don’t need (but should understand)
rsync computes a delta when only the receiver knows its own bytes: the receiver splits its file into fixed blocks and sends a weak rolling checksum (Adler-32) + strong checksum (MD4) per block; the sender rolls the weak checksum over every byte offset, and on a weak match verifies the strong checksum, then sends references + literal data.
- When rsync’s approach wins: you can’t re-chunk identically on both ends, or one side is “dumb” storage.
- Why we don’t use it: BitVault controls both ends and stores content-addressed CDC chunks, so each side hashes independently and the delta is simply the set difference of chunk hashes — no rolling-checksum exchange round-trip, and the dedup index already answers “which chunks exist.” rsync is the right tool for a different constraint; CDC + CAS is strictly better here (08).
7. Lessons → BitVault decisions
| Lesson (source) | BitVault decision |
|---|---|
| State (3 trees) beats activity (Dropbox) | Three-tree planner (ADR-0022) |
| Node IDs, not paths (Dropbox) | Stable node IDs; O(1) renames (08) |
| Cursor + longpoll + 409 reset (Dropbox) | Cursor over journal + notify + reset (ADR-0024) |
| Block-level dedup delta (Dropbox) | CDC + per-tenant dedup (storage/03) |
| Conflicted copies, edit-beats-delete (Syncthing) | Same policy (ADR-0026) |
| Scan is authoritative (both) | Watcher-as-hint + rescan (ADR-0025) |
| Central authority avoids vector clocks (vs Syncthing) | Server journal = total order (ADR-0022) |
| Pure planner enables exhaustive testing (Dropbox) | Planner is a pure function; property-tested |
References
- Dropbox, “Rewriting the heart of our sync engine”: https://dropbox.tech/infrastructure/rewriting-the-heart-of-our-sync-engine
- Dropbox, “Testing our new sync engine”: https://dropbox.tech/infrastructure/-testing-our-new-sync-engine
- Dropbox, “Streaming File Synchronization” (block delta): https://dropbox.tech/infrastructure/streaming-file-synchronization
- Dropbox, Detecting Changes Guide: https://developers.dropbox.com/detecting-changes-guide
- Syncthing BEP & syncing semantics: https://docs.syncthing.net/specs/bep-v1.html · https://docs.syncthing.net/users/syncing.html
- rsync technical report: https://www.samba.org/rsync/tech_report/