04 — File Watching & Scanning

Topic: local file watching. Decision in ADR-0025. The hard truth up front: every OS file watcher is lossy. Correctness cannot depend on the watcher; it depends on authoritative scanning + content hashing. The watcher only makes sync fast.


1. The backends and their (severe) limitations

OS API Documented failure mode
Linux inotify queue default 16,384 events → on overflow emits IN_Q_OVERFLOW and drops events; recovery = close fd, rescan. One watch per directory (memory cost at scale). Cannot identify the triggering process. No NFS/SMB.
macOS FSEvents coalesces events in a few-second window (OR’s bits); can set MUSTSCANSUBDIRS when dropping → “rescan this subtree”; ~4096 watched paths / ~450 streams limit; historically directory-granular.
Windows ReadDirectoryChangesW fixed per-handle buffer; on overflow returns 0 bytes and all pending events are lost; a race window exists between processing batches.
BSD kqueue one fd per file (worse at scale)
any polling last-resort fallback for network mounts where no native events exist

The universal lesson (inotify, FSEvents, Windows all say it): a watcher can and will silently miss events under load, on network filesystems, and across races. Therefore the watcher is an optimization, and the scanner is the source of truth (ADR-0025). Even Syncthing rescans hourly with a watcher attached.


2. The observation pipeline

flowchart TB
    classDef a fill:#c7d2fe,stroke:#3730a3,color:#111827;
    classDef d fill:#fde68a,stroke:#b45309,color:#111827;
    classDef o fill:#bbf7d0,stroke:#15803d,color:#111827;
    w["Watcher events (hints)"]:::a --> deb["Debounce + coalesce<br/>(settle period; wait for file to stop changing)"]:::a
    sc["Scanner (startup / scheduled / overflow-triggered)"]:::a --> cand
    deb --> ig{"ignore filter?<br/>(.tmp, lockfiles, node_modules, .git, OUR writes)"}:::d
    ig -- ignore --> drop["drop"]:::d
    ig -- keep --> cand["candidate paths"]:::a
    cand --> st["stat each"]:::a
    st --> fp{"(inode,mtime,size) == cached?"}:::d
    fp -- yes --> skip["skip (assume unchanged)"]:::o
    fp -- no --> h["hash (BLAKE3 + CDC)"]:::a
    h --> mv{"hash matches a deleted node's? (move)"}:::d
    mv -- yes --> rn["emit RENAME (don't re-upload)"]:::o
    mv -- no --> upd["update Local tree → wake planner"]:::o
    ov["IN_Q_OVERFLOW / MUSTSCANSUBDIRS / 0-byte buffer"]:::d --> sc

Stages that matter:


3. Scanning strategy


4. Self-write suppression (avoiding the feedback loop)

When the downloader writes a file, the very next watcher event is our own write — if we treated it as a user change we’d re-upload it forever. Because inotify (and others) cannot tell us which process caused an event, we suppress explicitly:

Before applying a download, record (path, expected_inode, expected_hash) in a short-lived “recently applied” set. Watcher events whose post-state matches the expected hash are absorbed (they confirm our own write) rather than treated as user edits. A mismatch means the user changed it again → handled as a real local change.

This, plus the content-hash comparison against the Synced tree, makes the loop self-correcting: an event that doesn’t change content vs Synced produces no operation.


5. Move / rename detection

A move often appears as delete(old) + create(new). Re-uploading a 1 GiB moved file would be wasteful. Detection:

Combined with stable node IDs (README §2), this keeps moves O(1) end-to-end.


6. Tradeoffs / Alternatives / Scaling

Tradeoffs. The watcher reduces latency from “next scan” to “near-instant” but adds the complexity of debouncing, overflow handling, and self-write suppression. We accept it because latency matters for UX; correctness is still owned by the scanner so a buggy watcher can never corrupt state — only slow it down.

Alternatives considered.

Scaling concerns.

References