06 — High-Level Architecture

Covers tasks 8 & 11 (diagrams). The system in layers, the canonical request flows, and the deployment topology. Diagrams use C4-style flowcharts (portable across Mermaid renderers) plus sequence diagrams for flows.

Reminder: in v1 the “services” below are modules in one bitvaultd binary (ADR-0001). The diagrams show the logical architecture, which is identical whether deployed as one process or many — that is the point of the seam design.

Architecture Freeze V1 (2026-06-12): diagrams updated for the freeze — the change journal is written in the commit transaction by File & Metadata (source of truth; Sync reads it, not a bus projection), the event bus is in-process in V1 (NATS at P3), and search is Postgres-FTS in V1. See data model §6.


1. System Context (C4 Level 1)

Who/what uses BitVault and what BitVault depends on.

flowchart TB
    classDef person fill:#dbeafe,stroke:#1e40af,color:#111827;
    classDef system fill:#fde68a,stroke:#b45309,color:#111827;
    classDef ext fill:#e5e7eb,stroke:#6b7280,color:#111827;

    user["End User<br/>(web / mobile)"]:::person
    dev["Developer / Operator<br/>(CLI, automation)"]:::person
    admin["Tenant Admin"]:::person

    bv["BitVault Platform<br/>file storage · sync · sharing · search"]:::system

    obj["Object Storage<br/>MinIO / S3 / R2 / GCS / Azure"]:::ext
    idp["External IdP<br/>(OIDC / SSO)"]:::ext
    smtp["Email / SMTP"]:::ext
    hook["Tenant Webhook Endpoints"]:::ext

    user -->|"REST / HTTPS"| bv
    dev -->|"REST + gRPC (CLI)"| bv
    admin -->|"REST / HTTPS"| bv
    bv -->|"presigned PUT/GET"| obj
    bv -->|"authn"| idp
    bv -->|"notifications"| smtp
    bv -->|"signed events"| hook
    user <-. "direct byte transfer (presigned)" .-> obj

Note the dashed line: users transfer bytes directly to/from object storage; BitVault issues the URL but the payload never traverses its compute (R5/ADR-0011).


2. Container View (C4 Level 2)

The internal building blocks and the stores they own. (Logical; physically one binary in v1.)

flowchart TB
    classDef edge fill:#c7d2fe,stroke:#3730a3,color:#111827;
    classDef svc fill:#fde68a,stroke:#b45309,color:#111827;
    classDef worker fill:#fed7aa,stroke:#c2410c,color:#111827;
    classDef store fill:#bbf7d0,stroke:#15803d,color:#111827;
    classDef bus fill:#fbcfe8,stroke:#be185d,color:#111827;

    subgraph clients[Clients]
        web["Next.js Web"]:::edge
        cli["Go CLI"]:::edge
        mob["React Native<br/>(future)"]:::edge
    end

    gw["API Gateway / BFF<br/>REST edge · authn · rate-limit · REST↔gRPC"]:::edge

    subgraph control[Control plane services - gRPC]
        id["Identity"]:::svc
        fm["File & Metadata<br/>(source of truth + outbox)"]:::svc
        st["Storage<br/>(presign · multipart · commit)"]:::svc
        sh["Sharing"]:::svc
        sy["Sync"]:::svc
        srch["Search (query)"]:::svc
    end

    subgraph workers[Async workers]
        idx["Search Indexer"]:::worker
        ntf["Notifier"]:::worker
        gc["Finalizer / GC"]:::worker
        mtr["Meter"]:::worker
        aud["Audit Sink"]:::worker
    end

    bus{{"NATS JetStream<br/>(in-proc bus in v1)"}}:::bus

    pg[("PostgreSQL<br/>metadata · outbox")]:::store
    rd[("Redis<br/>cache · sessions · locks · rate-limit")]:::store
    os[("OpenSearch<br/>(optional; PG-FTS fallback)")]:::store
    obj[("Object Storage<br/>blobs")]:::store

    web & cli & mob --> gw
    gw --> id & fm & st & sh & sy & srch

    id --> pg
    fm --> pg
    sh --> pg
    sy --> pg
    st --> obj
    id & gw --> rd
    srch --> os

    fm -->|outbox→publish| bus
    st -->|outbox→publish| bus
    sh -->|outbox→publish| bus
    bus --> idx & ntf & gc & mtr & aud
    idx --> os
    gc --> obj
    mtr --> pg
    aud --> pg

    web -. presigned .-> obj
    cli -. presigned .-> obj

3. The layers (textual)

  1. Clients — Next.js web, Go CLI, future RN mobile, and third parties. All speak REST to the gateway and transfer bytes directly to object storage.
  2. Edge / Gateway — TLS, authn, per-tenant rate limiting, REST↔gRPC translation, BFF aggregation. Stateless; the primary horizontal-scale unit.
  3. Control-plane services (gRPC) — Identity, File & Metadata, Storage, Sharing, Sync, Search-query. Strong consistency; own their Postgres tables.
  4. Event backbone — transactional outbox → NATS JetStream (in-proc bus in v1). The Published Language between core and consumers.
  5. Async workers — indexer, notifier, GC/finalizer, meter, audit, previews. Eventually consistent; independently scalable; failures isolated.
  6. Data stores — Postgres (truth + outbox), Redis (cache/sessions/locks/ rate-limit), OpenSearch (optional derived index), object storage (blobs).
  7. Cross-cutting — OpenTelemetry (trace/metric/log), config (12-factor), secrets/KMS, health/readiness.

4. Canonical flow: Upload (the commit protocol)

This flow is the heart of correctness. It defeats the dual-write problem (R2): the namespace row and its event are written in one transaction, and bytes are verified before commit. An object with no committed metadata simply does not exist and is GC’d.

sequenceDiagram
    autonumber
    participant C as Client
    participant GW as API Gateway
    participant F as File & Metadata
    participant S as Storage
    participant O as Object Store
    participant B as Event Bus
    participant IX as Indexer

    C->>GW: POST /v1/files (init upload: path, size, hash)
    GW->>F: CreateUpload(node draft) [gRPC]
    F->>S: PresignPut(staging key, size-range, ttl)
    S-->>F: presigned URL (+ uploadId if multipart)
    F-->>GW: uploadId + presigned URL(s)
    GW-->>C: 201 {uploadId, url}

    C->>O: PUT bytes (direct, presigned)
    O-->>C: 200 (ETag)

    C->>GW: POST /v1/files/{uploadId}/commit (etag, hash)
    GW->>F: CommitUpload(uploadId, hash)
    F->>S: HeadObject(staging key) — verify size/etag/hash
    S->>O: HEAD staging key
    O-->>S: metadata (size, etag)
    S-->>F: verified ✔ (+ blob refcount++)
    Note over F: BEGIN TX<br/>insert node version<br/>append CHANGE(seq)<br/>insert outbox(NodeChanged)<br/>COMMIT
    F-->>GW: 200 (node, version)
    GW-->>C: 200 committed

    F->>B: publish NodeChanged (from outbox)
    B->>IX: NodeChanged
    IX->>IX: index name+metadata (eventual)
    Note over C,IX: UI may show "indexing…" until IX catches up

Failure handling:


5. Canonical flow: Download

sequenceDiagram
    autonumber
    participant C as Client
    participant GW as API Gateway
    participant SH as Sharing
    participant F as File & Metadata
    participant S as Storage
    participant O as Object Store

    C->>GW: GET /v1/files/{id}/content
    GW->>SH: CheckAccess(principal, node, read)
    SH-->>GW: allow
    GW->>F: ResolveVersion(node) → blob hash/key
    F->>S: PresignGet(key, ttl, response-headers)
    S-->>F: presigned GET URL
    F-->>GW: redirect URL
    GW-->>C: 302 (or {url})
    C->>O: GET bytes (direct, presigned)
    O-->>C: 200 bytes

Authz happens before URL issuance; the URL is scoped (exact key, short TTL). Bytes never touch BitVault compute.


6. Canonical flow: Sync (delta pull + conflict)

sequenceDiagram
    autonumber
    participant D as Device
    participant GW as API Gateway
    participant SY as Sync
    participant F as File & Metadata
    participant S as Storage

    Note over F,SY: File writes the per-tenant journal (CHANGE.seq) in the commit tx;<br/>Sync reads seq gt cursor — no event projection (ADR-0008)
    D->>GW: GET /v1/sync/changes?cursor=N
    GW->>SY: PullDeltas(cursor=N)
    SY->>F: GetChanges(seq gt N) [reads journal]
    F-->>SY: changes[N+1..M] + new cursor M
    SY-->>GW: deltas + cursor M
    GW-->>D: deltas (created/updated/moved/deleted)

    loop for each changed file to fetch
        D->>GW: GET content URL (download flow §5)
    end

    Note over D: local edit while offline →
    D->>GW: POST /v1/sync/push (node, baseVersion, hash)
    GW->>SY: Push(node, baseVersion)
    SY->>F: CompareAndCommit(baseVersion)
    alt baseVersion is current
        F-->>SY: committed new version
        SY-->>D: ok (no conflict)
    else baseVersion stale (concurrent change)
        F-->>SY: conflict (current != base)
        SY->>F: CreateConflictedCopy(node, device, ts)
        SY-->>D: conflict resolved as copy (both versions kept)
    end

The invariant: a stale base version never overwrites — it becomes a conflicted copy. Both histories survive; the user reconciles. (FR C5 / ADR-0008.)


7. Event flow (async derivation)

flowchart LR
    classDef src fill:#fde68a,stroke:#b45309,color:#111827;
    classDef bus fill:#fbcfe8,stroke:#be185d,color:#111827;
    classDef cons fill:#fed7aa,stroke:#c2410c,color:#111827;
    classDef sink fill:#bbf7d0,stroke:#15803d,color:#111827;

    FM["File & Metadata<br/>(outbox)"]:::src
    ST["Storage<br/>(outbox)"]:::src
    SH["Sharing<br/>(outbox)"]:::src

    J{{"event bus (in-proc v1; NATS P3)<br/>subjects: node.*, blob.*, share.*"}}:::bus

    JR[("change journal<br/>written at commit · source of truth")]:::sink
    IX["Search indexer"]:::cons
    NT["Notifier"]:::cons
    MT["Meter"]:::cons
    AU["Audit"]:::cons
    GC["GC / finalizer"]:::cons

    OS[("search index<br/>(PG-FTS v1; OpenSearch P3)")]:::sink
    PG[("Postgres: meters / audit")]:::sink
    HK[("Webhooks / Email")]:::sink
    OB[("Object Store")]:::sink

    FM & ST & SH -->|"at-least-once (derived consumers)"| J
    FM -->|"commit tx (source of truth)"| JR
    J --> IX --> OS
    J --> NT --> HK
    J --> MT --> PG
    J --> AU --> PG
    J --> GC --> OB

All bus consumers are idempotent (dedup on event id), per-aggregate ordered, with a DLQ for poison messages. Derived stores (search index, meters, notifications) are rebuildable by replay (R7/I6). The change journal is written in the commit transaction (source of truth, ADR-0008) — it is not a bus projection.


8. Deployment topology (Kubernetes)

flowchart TB
    classDef ext fill:#e5e7eb,stroke:#6b7280,color:#111827;
    classDef k8s fill:#c7d2fe,stroke:#3730a3,color:#111827;
    classDef state fill:#bbf7d0,stroke:#15803d,color:#111827;

    internet(("Internet")):::ext
    cdn["CDN (optional)"]:::ext

    subgraph cluster["Kubernetes Cluster"]
        ing["Ingress / LB<br/>(TLS termination)"]:::k8s
        subgraph ns["Namespace: bitvault"]
            gwd["Deployment: gateway<br/>(HPA)"]:::k8s
            apid["Deployment: bitvaultd<br/>(control-plane, HPA)"]:::k8s
            wkr["Deployment: workers<br/>(indexer/notifier/gc, HPA)"]:::k8s
        end
        subgraph data["Stateful (operators / managed)"]
            pg[("PostgreSQL<br/>primary + replica")]:::state
            rd[("Redis")]:::state
            nats[("NATS JetStream")]:::state
            osd[("OpenSearch<br/>(optional)")]:::state
        end
    end

    obj[("Object Storage<br/>MinIO in-cluster or<br/>S3/R2/GCS/Azure")]:::ext

    internet --> ing --> gwd --> apid
    apid --> pg & rd & nats
    wkr --> nats & osd & pg & obj
    apid --> obj
    internet -. presigned .-> obj
    internet --> cdn -. cache .-> obj

9. Cross-cutting architecture

See 08-data-model for the data shapes referenced here and 09-evolution-roadmap for how this topology is reached in phases.