03 — Local Database Design

Topics: local database design, metadata caching. Decision in ADR-0023. The local DB is the engine’s memory — it holds the three trees, the durable queue, the cursor, and the metadata caches that make scanning fast. It is a rebuildable cache, not a source of truth.

1. Engine choice: SQLite (embedded, transactional)

Decision: SQLite in WAL mode, one database file per sync root.

Embedded (no daemon), single-file, ACID transactions, ubiquitous, battle-tested for exactly this job (Dropbox’s client uses SQLite). Transactions are what make crash-consistent state transitions possible — the whole point.
WAL mode → concurrent readers (scanner, planner) with a single writer (control thread), good for our threading model (02).

2. Schema (one row per node carries all three trees)

Rather than three physical tables, we keep one node row per node_id holding the synced / remote / local snapshots side-by-side. The planner then reconciles with a single indexed scan of rows where the three snapshots disagree.

erDiagram
    NODE ||--o{ QUEUE_OP : "has pending"
    NODE ||--o{ CONFLICT : "may produce"
    CHUNK_CACHE }o--|| NODE : "reconstructs"
    NODE {
        text node_id PK "stable server id (null until first commit)"
        text parent_id
        text name
        text type "file|dir"
        bool synced_present
        text synced_hash
        int  synced_version "server version at last sync"
        bool remote_present
        text remote_hash
        int  remote_version
        bool local_present
        text local_hash "observed content hash"
        int  local_inode
        int  local_mtime
        int  local_size
        int  scan_gen "generation of last scan that saw it"
        text state "node sync state (see 06)"
    }
    QUEUE_OP {
        int  id PK
        text node_id FK
        text op_type "upload|download|rename|delete|mkdir|conflict"
        text state "ready|running|blocked|failed|done"
        int  priority
        int  attempts
        int  next_retry_at
        text depends_on "parent op id (ordering)"
        text last_error
    }
    CHUNK_CACHE {
        text chunk_hash PK
        text location "node_id+offset or local CAS path"
        int  size
        int  last_used
    }
    CONFLICT {
        int  id PK
        text node_id FK
        text kind "edit/edit|edit/delete|name|type|case"
        text resolution "conflicted-copy path"
        int  created_at
        bool acknowledged
    }
    META {
        text key PK "cursor|device_id|schema_version|bulk_delete_watermark|settings"
        text value
    }

Three snapshots in one row → planner = WHERE synced_hash IS DISTINCT FROM remote_hash OR synced_hash IS DISTINCT FROM local_hash (plus presence flags). Cheap and index-friendly.
QUEUE_OP is the durable sync queue (10) — survives restart, carries dependencies, retries, priority.
CHUNK_CACHE maps locally-available chunk hashes to where to read them, so delta downloads reuse chunks the device already has (08).
META.cursor is the journal position (07); bulk_delete_watermark backs the delete circuit-breaker (11, ADR-0027).

3. Metadata caching: the hashing fast-path (critical for scale)

Re-hashing every file on every scan is unaffordable at millions of files. The cache keys observed content by cheap stat metadata:

Fast-path rule: if a file’s (inode, mtime, size) matches the stored local_*, assume the content is unchanged and skip hashing. Only stat-changed files are hashed.

This turns a rescan from “hash everything” into “stat everything, hash the few that changed” — orders of magnitude cheaper. mtime is a hint, not truth (it can be forged/reset); a periodic deep scan (configurable, infrequent) re-hashes regardless to catch silent mismatches and bitrot. Remote metadata is likewise cached to avoid re-listing.

4. Crash consistency

Every state transition is a transaction. Advancing the Synced tree for a node, enqueuing ops, and updating the cursor each commit atomically; a crash leaves the DB at a consistent prior point and the control loop re-plans from there (02).
Apply-then-record ordering for downloads: the file is atomically renamed into place before the DB records it as synced; if we crash between, the next scan sees the file and reconciles (idempotent), never a DB-says-synced-but-file-missing lie.
WAL checkpointing bounds the WAL; periodic VACUUM/incremental-vacuum bounds file growth.

5. The DB is a rebuildable cache (ADR-0023)

If the local DB is lost or corrupted, no user data is lost: the engine rebuilds it by (a) full local scan → Local tree, (b) full server list → Remote tree, (c) setting Synced = ∅ and letting the planner reconcile. Reconciling from an empty Synced tree is safe but conservative — it treats every divergence as a potential conflict and errs toward conflicted copies rather than overwrites (never destructive). This is the durability backstop that lets us treat the DB as a performance cache.

6. Tradeoffs / Alternatives / Scaling

Tradeoffs. SQLite’s single-writer limits write concurrency, which suits our single control thread but means the control thread must keep transactions short (batch, don’t hold). One DB per sync root keeps each DB bounded and isolates corruption blast radius.

Alternatives considered.

Embedded KV (BoltDB/LevelDB/Badger): faster simple writes, but we want relational queries (the planner scan, dependency joins) and transactions across tables; SQLite fits better. Considered for the chunk cache only.
Flat files / JSON manifests: no transactions, no crash-consistency, no efficient queries — a non-starter at scale.
Reuse the server DB model: the client is offline-first and embedded; it needs a local store, not a network round-trip per lookup.

Scaling concerns.

Millions of nodes → the node table is indexed on (parent_id), (state), and the disagreement predicate; SQLite handles millions of rows fine with proper indexes.
DB size grows with file count, not file bytes (metadata only) → bounded; chunk cache is capped/LRU-evicted (CHUNK_CACHE.last_used).
Write amplification from rapid edits → batch tree updates per control-loop tick, not per event.
Multiple sync roots → independent DBs, independent engines, no cross-root locking.