07 — Sync Protocol

Topics: sync protocols, multi-device synchronization. Deliverables: protocol design, event flows, sequence diagrams. Decision in ADR-0024. Builds on the change journal (ADR-0008) and the storage commit protocol (storage/05).


1. Two channels: notify (cheap) + cursor pull (authoritative)

The protocol deliberately separates a lossy, low-latency signal from a reliable, ordered data pull — the Dropbox pattern.

Channel Purpose Reliability Transport
Notification “namespace advanced past your cursor — come pull” may be lossy (it only triggers a pull) gRPC server-stream / WebSocket; longpoll fallback w/ jitter
Cursor pull the actual ordered delta of changes authoritative, exactly-resumable gRPC unary/stream (REST for 3rd parties)

Because the notification carries no content and the pull is authoritative, a dropped or duplicated notification is harmless — the worst case is latency (the periodic poll catches it). This is what makes the realtime tier cheap and scalable.


2. The cursor


3. Pull flow (notification + cursor delta)

sequenceDiagram
    autonumber
    participant D as Device engine
    participant GW as Gateway / Sync API
    participant SY as Sync service (journal+cursor)
    participant N as Notifier (NATS fan-out)
    D->>GW: Longpoll/Subscribe(cursor=C)
    GW->>N: register device stream for namespace
    Note over D,N: blocks until change (+ random jitter vs thundering herd)
    SY-->>N: journal advanced (from outbox/NATS, ADR-0006)
    N-->>D: signal "changes available"
    D->>GW: GetChanges(cursor=C)
    GW->>SY: read journal (seq > C), ordered
    SY-->>GW: [create/update/move/delete ...] + cursor=C'
    GW-->>D: delta + C'
    D->>D: apply to Remote tree → plan ([05]) → download ([08])
    D->>D: persist cursor C'

Delta entries are node-keyed (node_id, op, new version, content_hash, parent, name), so renames/moves arrive as one move entry, not a delete+create storm.


4. Push flow (commit with optimistic concurrency)

Upload reuses the storage commit protocol; the sync-specific part is the base version for conflict detection.

sequenceDiagram
    autonumber
    participant D as Device engine
    participant GW as Gateway
    participant ST as Storage (chunks/manifest)
    participant FM as File & Metadata (journal)
    D->>D: CDC chunk + BLAKE3 ([08])
    D->>GW: NegotiateChunks([hashes]) (per-tenant dedup, ADR-0018)
    GW->>ST: which chunks missing?
    ST-->>GW: missing subset
    GW-->>D: upload only missing (presigned PUT)
    D-->>ST: PUT new chunks (direct, ADR-0011)
    D->>GW: Commit(node_id, base_version=V, manifest)
    GW->>FM: compare-and-set on base_version
    alt base_version current
        FM-->>GW: committed → journal seq++
        GW-->>D: ok (new version V+1) → advance Synced
        FM-->>FM: emit NodeChanged → notifies OTHER devices (§5)
    else base_version stale
        FM-->>GW: conflict (remote moved since V)
        GW-->>D: 409 conflict → resolve ([09])
    end

5. Multi-device propagation (the server is the serialization point)

sequenceDiagram
    autonumber
    participant A as Device A
    participant S as Server (journal = total order)
    participant B as Device B
    participant C as Device C
    A->>S: Commit edit (base V) → seq=101, version V+1
    S-->>A: ok
    S-->>B: notify (namespace advanced)
    S-->>C: notify
    B->>S: GetChanges(cursor=100) → [node@101] + cursor=101
    C->>S: GetChanges(cursor=100) → [node@101] + cursor=101
    B->>B: download delta → converge
    C->>C: download delta → converge

Every device converges to the same server state because the journal imposes a single total order. Concurrent commits from two devices are serialized at the compare-and-set (§4): the first wins the version, the second gets a 409 and resolves to a conflicted copy (09) — which then propagates to all devices as a new node. No vector clocks needed (contrast Syncthing, 01 §5).


6. Event flow (server side)

flowchart LR
    classDef s fill:#fde68a,stroke:#b45309,color:#111827;
    classDef b fill:#fbcfe8,stroke:#be185d,color:#111827;
    classDef d fill:#c7d2fe,stroke:#3730a3,color:#111827;
    FM["File & Metadata commit<br/>(outbox, ADR-0006)"]:::s --> J["Change journal (seq++)"]:::s
    J --> BUS{{"NATS JetStream<br/>subject: namespace.{tenant}"}}:::b
    BUS --> NT["Notifier tier<br/>(device stream registry)"]:::s
    NT --> D1["Device A stream"]:::d
    NT --> D2["Device B stream"]:::d
    NT --> D3["Device C longpoll"]:::d

The journal is the source of truth; NATS fans out the signal; the notifier maps namespace → connected devices and pushes “go pull.” Devices then pull authoritatively.


7. Selective sync & online-only files


8. Tradeoffs / Alternatives / Scaling

Tradeoffs. Splitting notify (lossy) from pull (authoritative) adds a channel but makes the realtime tier cheap and failure-tolerant; folding them into one reliable push of content would not scale and would couple correctness to delivery.

Alternatives considered.

Scaling concerns.