03 — Local Database Design

Topics: local database design, metadata caching. Decision in ADR-0023. The local DB is the engine’s memory — it holds the three trees, the durable queue, the cursor, and the metadata caches that make scanning fast. It is a rebuildable cache, not a source of truth.


1. Engine choice: SQLite (embedded, transactional)

Decision: SQLite in WAL mode, one database file per sync root.


2. Schema (one row per node carries all three trees)

Rather than three physical tables, we keep one node row per node_id holding the synced / remote / local snapshots side-by-side. The planner then reconciles with a single indexed scan of rows where the three snapshots disagree.

erDiagram
    NODE ||--o{ QUEUE_OP : "has pending"
    NODE ||--o{ CONFLICT : "may produce"
    CHUNK_CACHE }o--|| NODE : "reconstructs"
    NODE {
        text node_id PK "stable server id (null until first commit)"
        text parent_id
        text name
        text type "file|dir"
        bool synced_present
        text synced_hash
        int  synced_version "server version at last sync"
        bool remote_present
        text remote_hash
        int  remote_version
        bool local_present
        text local_hash "observed content hash"
        int  local_inode
        int  local_mtime
        int  local_size
        int  scan_gen "generation of last scan that saw it"
        text state "node sync state (see 06)"
    }
    QUEUE_OP {
        int  id PK
        text node_id FK
        text op_type "upload|download|rename|delete|mkdir|conflict"
        text state "ready|running|blocked|failed|done"
        int  priority
        int  attempts
        int  next_retry_at
        text depends_on "parent op id (ordering)"
        text last_error
    }
    CHUNK_CACHE {
        text chunk_hash PK
        text location "node_id+offset or local CAS path"
        int  size
        int  last_used
    }
    CONFLICT {
        int  id PK
        text node_id FK
        text kind "edit/edit|edit/delete|name|type|case"
        text resolution "conflicted-copy path"
        int  created_at
        bool acknowledged
    }
    META {
        text key PK "cursor|device_id|schema_version|bulk_delete_watermark|settings"
        text value
    }

3. Metadata caching: the hashing fast-path (critical for scale)

Re-hashing every file on every scan is unaffordable at millions of files. The cache keys observed content by cheap stat metadata:

Fast-path rule: if a file’s (inode, mtime, size) matches the stored local_*, assume the content is unchanged and skip hashing. Only stat-changed files are hashed.

This turns a rescan from “hash everything” into “stat everything, hash the few that changed” — orders of magnitude cheaper. mtime is a hint, not truth (it can be forged/reset); a periodic deep scan (configurable, infrequent) re-hashes regardless to catch silent mismatches and bitrot. Remote metadata is likewise cached to avoid re-listing.


4. Crash consistency


5. The DB is a rebuildable cache (ADR-0023)

If the local DB is lost or corrupted, no user data is lost: the engine rebuilds it by (a) full local scan → Local tree, (b) full server list → Remote tree, (c) setting Synced = ∅ and letting the planner reconcile. Reconciling from an empty Synced tree is safe but conservative — it treats every divergence as a potential conflict and errs toward conflicted copies rather than overwrites (never destructive). This is the durability backstop that lets us treat the DB as a performance cache.


6. Tradeoffs / Alternatives / Scaling

Tradeoffs. SQLite’s single-writer limits write concurrency, which suits our single control thread but means the control thread must keep transactions short (batch, don’t hold). One DB per sync root keeps each DB bounded and isolates corruption blast radius.

Alternatives considered.

Scaling concerns.