03 — Local Database Design
Topics: local database design, metadata caching. Decision in ADR-0023. The local DB is the engine’s memory — it holds the three trees, the durable queue, the cursor, and the metadata caches that make scanning fast. It is a rebuildable cache, not a source of truth.
1. Engine choice: SQLite (embedded, transactional)
Decision: SQLite in WAL mode, one database file per sync root.
- Embedded (no daemon), single-file, ACID transactions, ubiquitous, battle-tested for exactly this job (Dropbox’s client uses SQLite). Transactions are what make crash-consistent state transitions possible — the whole point.
- WAL mode → concurrent readers (scanner, planner) with a single writer (control thread), good for our threading model (02).
2. Schema (one row per node carries all three trees)
Rather than three physical tables, we keep one node row per node_id holding the
synced / remote / local snapshots side-by-side. The planner then reconciles with a
single indexed scan of rows where the three snapshots disagree.
erDiagram
NODE ||--o{ QUEUE_OP : "has pending"
NODE ||--o{ CONFLICT : "may produce"
CHUNK_CACHE }o--|| NODE : "reconstructs"
NODE {
text node_id PK "stable server id (null until first commit)"
text parent_id
text name
text type "file|dir"
bool synced_present
text synced_hash
int synced_version "server version at last sync"
bool remote_present
text remote_hash
int remote_version
bool local_present
text local_hash "observed content hash"
int local_inode
int local_mtime
int local_size
int scan_gen "generation of last scan that saw it"
text state "node sync state (see 06)"
}
QUEUE_OP {
int id PK
text node_id FK
text op_type "upload|download|rename|delete|mkdir|conflict"
text state "ready|running|blocked|failed|done"
int priority
int attempts
int next_retry_at
text depends_on "parent op id (ordering)"
text last_error
}
CHUNK_CACHE {
text chunk_hash PK
text location "node_id+offset or local CAS path"
int size
int last_used
}
CONFLICT {
int id PK
text node_id FK
text kind "edit/edit|edit/delete|name|type|case"
text resolution "conflicted-copy path"
int created_at
bool acknowledged
}
META {
text key PK "cursor|device_id|schema_version|bulk_delete_watermark|settings"
text value
}
- Three snapshots in one row → planner =
WHERE synced_hash IS DISTINCT FROM remote_hash OR synced_hash IS DISTINCT FROM local_hash(plus presence flags). Cheap and index-friendly. QUEUE_OPis the durable sync queue (10) — survives restart, carries dependencies, retries, priority.CHUNK_CACHEmaps locally-available chunk hashes to where to read them, so delta downloads reuse chunks the device already has (08).META.cursoris the journal position (07);bulk_delete_watermarkbacks the delete circuit-breaker (11, ADR-0027).
3. Metadata caching: the hashing fast-path (critical for scale)
Re-hashing every file on every scan is unaffordable at millions of files. The cache keys observed content by cheap stat metadata:
Fast-path rule: if a file’s
(inode, mtime, size)matches the storedlocal_*, assume the content is unchanged and skip hashing. Only stat-changed files are hashed.
This turns a rescan from “hash everything” into “stat everything, hash the few that
changed” — orders of magnitude cheaper. mtime is a hint, not truth (it can be
forged/reset); a periodic deep scan (configurable, infrequent) re-hashes regardless
to catch silent mismatches and bitrot. Remote metadata is likewise cached to avoid
re-listing.
4. Crash consistency
- Every state transition is a transaction. Advancing the Synced tree for a node, enqueuing ops, and updating the cursor each commit atomically; a crash leaves the DB at a consistent prior point and the control loop re-plans from there (02).
- Apply-then-record ordering for downloads: the file is atomically renamed into place before the DB records it as synced; if we crash between, the next scan sees the file and reconciles (idempotent), never a DB-says-synced-but-file-missing lie.
- WAL checkpointing bounds the WAL; periodic
VACUUM/incremental-vacuum bounds file growth.
5. The DB is a rebuildable cache (ADR-0023)
If the local DB is lost or corrupted, no user data is lost: the engine rebuilds it by (a) full local scan → Local tree, (b) full server list → Remote tree, (c) setting Synced = ∅ and letting the planner reconcile. Reconciling from an empty Synced tree is safe but conservative — it treats every divergence as a potential conflict and errs toward conflicted copies rather than overwrites (never destructive). This is the durability backstop that lets us treat the DB as a performance cache.
6. Tradeoffs / Alternatives / Scaling
Tradeoffs. SQLite’s single-writer limits write concurrency, which suits our single control thread but means the control thread must keep transactions short (batch, don’t hold). One DB per sync root keeps each DB bounded and isolates corruption blast radius.
Alternatives considered.
- Embedded KV (BoltDB/LevelDB/Badger): faster simple writes, but we want relational queries (the planner scan, dependency joins) and transactions across tables; SQLite fits better. Considered for the chunk cache only.
- Flat files / JSON manifests: no transactions, no crash-consistency, no efficient queries — a non-starter at scale.
- Reuse the server DB model: the client is offline-first and embedded; it needs a local store, not a network round-trip per lookup.
Scaling concerns.
- Millions of nodes → the
nodetable is indexed on(parent_id),(state), and the disagreement predicate; SQLite handles millions of rows fine with proper indexes. - DB size grows with file count, not file bytes (metadata only) → bounded; chunk
cache is capped/LRU-evicted (
CHUNK_CACHE.last_used). - Write amplification from rapid edits → batch tree updates per control-loop tick, not per event.
- Multiple sync roots → independent DBs, independent engines, no cross-root locking.