06 — High-Level Architecture
Covers tasks 8 & 11 (diagrams). The system in layers, the canonical request flows, and the deployment topology. Diagrams use C4-style flowcharts (portable across Mermaid renderers) plus sequence diagrams for flows.
Reminder: in v1 the “services” below are modules in one
bitvaultdbinary (ADR-0001). The diagrams show the logical architecture, which is identical whether deployed as one process or many — that is the point of the seam design.
Architecture Freeze V1 (2026-06-12): diagrams updated for the freeze — the change journal is written in the commit transaction by File & Metadata (source of truth; Sync reads it, not a bus projection), the event bus is in-process in V1 (NATS at P3), and search is Postgres-FTS in V1. See data model §6.
1. System Context (C4 Level 1)
Who/what uses BitVault and what BitVault depends on.
flowchart TB
classDef person fill:#dbeafe,stroke:#1e40af,color:#111827;
classDef system fill:#fde68a,stroke:#b45309,color:#111827;
classDef ext fill:#e5e7eb,stroke:#6b7280,color:#111827;
user["End User<br/>(web / mobile)"]:::person
dev["Developer / Operator<br/>(CLI, automation)"]:::person
admin["Tenant Admin"]:::person
bv["BitVault Platform<br/>file storage · sync · sharing · search"]:::system
obj["Object Storage<br/>MinIO / S3 / R2 / GCS / Azure"]:::ext
idp["External IdP<br/>(OIDC / SSO)"]:::ext
smtp["Email / SMTP"]:::ext
hook["Tenant Webhook Endpoints"]:::ext
user -->|"REST / HTTPS"| bv
dev -->|"REST + gRPC (CLI)"| bv
admin -->|"REST / HTTPS"| bv
bv -->|"presigned PUT/GET"| obj
bv -->|"authn"| idp
bv -->|"notifications"| smtp
bv -->|"signed events"| hook
user <-. "direct byte transfer (presigned)" .-> obj
Note the dashed line: users transfer bytes directly to/from object storage; BitVault issues the URL but the payload never traverses its compute (R5/ADR-0011).
2. Container View (C4 Level 2)
The internal building blocks and the stores they own. (Logical; physically one binary in v1.)
flowchart TB
classDef edge fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef svc fill:#fde68a,stroke:#b45309,color:#111827;
classDef worker fill:#fed7aa,stroke:#c2410c,color:#111827;
classDef store fill:#bbf7d0,stroke:#15803d,color:#111827;
classDef bus fill:#fbcfe8,stroke:#be185d,color:#111827;
subgraph clients[Clients]
web["Next.js Web"]:::edge
cli["Go CLI"]:::edge
mob["React Native<br/>(future)"]:::edge
end
gw["API Gateway / BFF<br/>REST edge · authn · rate-limit · REST↔gRPC"]:::edge
subgraph control[Control plane services - gRPC]
id["Identity"]:::svc
fm["File & Metadata<br/>(source of truth + outbox)"]:::svc
st["Storage<br/>(presign · multipart · commit)"]:::svc
sh["Sharing"]:::svc
sy["Sync"]:::svc
srch["Search (query)"]:::svc
end
subgraph workers[Async workers]
idx["Search Indexer"]:::worker
ntf["Notifier"]:::worker
gc["Finalizer / GC"]:::worker
mtr["Meter"]:::worker
aud["Audit Sink"]:::worker
end
bus{{"NATS JetStream<br/>(in-proc bus in v1)"}}:::bus
pg[("PostgreSQL<br/>metadata · outbox")]:::store
rd[("Redis<br/>cache · sessions · locks · rate-limit")]:::store
os[("OpenSearch<br/>(optional; PG-FTS fallback)")]:::store
obj[("Object Storage<br/>blobs")]:::store
web & cli & mob --> gw
gw --> id & fm & st & sh & sy & srch
id --> pg
fm --> pg
sh --> pg
sy --> pg
st --> obj
id & gw --> rd
srch --> os
fm -->|outbox→publish| bus
st -->|outbox→publish| bus
sh -->|outbox→publish| bus
bus --> idx & ntf & gc & mtr & aud
idx --> os
gc --> obj
mtr --> pg
aud --> pg
web -. presigned .-> obj
cli -. presigned .-> obj
3. The layers (textual)
- Clients — Next.js web, Go CLI, future RN mobile, and third parties. All speak REST to the gateway and transfer bytes directly to object storage.
- Edge / Gateway — TLS, authn, per-tenant rate limiting, REST↔gRPC translation, BFF aggregation. Stateless; the primary horizontal-scale unit.
- Control-plane services (gRPC) — Identity, File & Metadata, Storage, Sharing, Sync, Search-query. Strong consistency; own their Postgres tables.
- Event backbone — transactional outbox → NATS JetStream (in-proc bus in v1). The Published Language between core and consumers.
- Async workers — indexer, notifier, GC/finalizer, meter, audit, previews. Eventually consistent; independently scalable; failures isolated.
- Data stores — Postgres (truth + outbox), Redis (cache/sessions/locks/ rate-limit), OpenSearch (optional derived index), object storage (blobs).
- Cross-cutting — OpenTelemetry (trace/metric/log), config (12-factor), secrets/KMS, health/readiness.
4. Canonical flow: Upload (the commit protocol)
This flow is the heart of correctness. It defeats the dual-write problem (R2): the namespace row and its event are written in one transaction, and bytes are verified before commit. An object with no committed metadata simply does not exist and is GC’d.
sequenceDiagram
autonumber
participant C as Client
participant GW as API Gateway
participant F as File & Metadata
participant S as Storage
participant O as Object Store
participant B as Event Bus
participant IX as Indexer
C->>GW: POST /v1/files (init upload: path, size, hash)
GW->>F: CreateUpload(node draft) [gRPC]
F->>S: PresignPut(staging key, size-range, ttl)
S-->>F: presigned URL (+ uploadId if multipart)
F-->>GW: uploadId + presigned URL(s)
GW-->>C: 201 {uploadId, url}
C->>O: PUT bytes (direct, presigned)
O-->>C: 200 (ETag)
C->>GW: POST /v1/files/{uploadId}/commit (etag, hash)
GW->>F: CommitUpload(uploadId, hash)
F->>S: HeadObject(staging key) — verify size/etag/hash
S->>O: HEAD staging key
O-->>S: metadata (size, etag)
S-->>F: verified ✔ (+ blob refcount++)
Note over F: BEGIN TX<br/>insert node version<br/>append CHANGE(seq)<br/>insert outbox(NodeChanged)<br/>COMMIT
F-->>GW: 200 (node, version)
GW-->>C: 200 committed
F->>B: publish NodeChanged (from outbox)
B->>IX: NodeChanged
IX->>IX: index name+metadata (eventual)
Note over C,IX: UI may show "indexing…" until IX catches up
Failure handling:
- Client uploads bytes but never commits → staging blob has refcount 0 → GC reclaims it after TTL. No dangling reference, no orphan leak.
- Commit fails verification (hash/size mismatch) → no metadata written → safe retry.
- Outbox decouples publish from the transaction → at-least-once delivery; consumers are idempotent (ADR-0006).
5. Canonical flow: Download
sequenceDiagram
autonumber
participant C as Client
participant GW as API Gateway
participant SH as Sharing
participant F as File & Metadata
participant S as Storage
participant O as Object Store
C->>GW: GET /v1/files/{id}/content
GW->>SH: CheckAccess(principal, node, read)
SH-->>GW: allow
GW->>F: ResolveVersion(node) → blob hash/key
F->>S: PresignGet(key, ttl, response-headers)
S-->>F: presigned GET URL
F-->>GW: redirect URL
GW-->>C: 302 (or {url})
C->>O: GET bytes (direct, presigned)
O-->>C: 200 bytes
Authz happens before URL issuance; the URL is scoped (exact key, short TTL). Bytes never touch BitVault compute.
6. Canonical flow: Sync (delta pull + conflict)
sequenceDiagram
autonumber
participant D as Device
participant GW as API Gateway
participant SY as Sync
participant F as File & Metadata
participant S as Storage
Note over F,SY: File writes the per-tenant journal (CHANGE.seq) in the commit tx;<br/>Sync reads seq gt cursor — no event projection (ADR-0008)
D->>GW: GET /v1/sync/changes?cursor=N
GW->>SY: PullDeltas(cursor=N)
SY->>F: GetChanges(seq gt N) [reads journal]
F-->>SY: changes[N+1..M] + new cursor M
SY-->>GW: deltas + cursor M
GW-->>D: deltas (created/updated/moved/deleted)
loop for each changed file to fetch
D->>GW: GET content URL (download flow §5)
end
Note over D: local edit while offline →
D->>GW: POST /v1/sync/push (node, baseVersion, hash)
GW->>SY: Push(node, baseVersion)
SY->>F: CompareAndCommit(baseVersion)
alt baseVersion is current
F-->>SY: committed new version
SY-->>D: ok (no conflict)
else baseVersion stale (concurrent change)
F-->>SY: conflict (current != base)
SY->>F: CreateConflictedCopy(node, device, ts)
SY-->>D: conflict resolved as copy (both versions kept)
end
The invariant: a stale base version never overwrites — it becomes a conflicted copy. Both histories survive; the user reconciles. (FR C5 / ADR-0008.)
7. Event flow (async derivation)
flowchart LR
classDef src fill:#fde68a,stroke:#b45309,color:#111827;
classDef bus fill:#fbcfe8,stroke:#be185d,color:#111827;
classDef cons fill:#fed7aa,stroke:#c2410c,color:#111827;
classDef sink fill:#bbf7d0,stroke:#15803d,color:#111827;
FM["File & Metadata<br/>(outbox)"]:::src
ST["Storage<br/>(outbox)"]:::src
SH["Sharing<br/>(outbox)"]:::src
J{{"event bus (in-proc v1; NATS P3)<br/>subjects: node.*, blob.*, share.*"}}:::bus
JR[("change journal<br/>written at commit · source of truth")]:::sink
IX["Search indexer"]:::cons
NT["Notifier"]:::cons
MT["Meter"]:::cons
AU["Audit"]:::cons
GC["GC / finalizer"]:::cons
OS[("search index<br/>(PG-FTS v1; OpenSearch P3)")]:::sink
PG[("Postgres: meters / audit")]:::sink
HK[("Webhooks / Email")]:::sink
OB[("Object Store")]:::sink
FM & ST & SH -->|"at-least-once (derived consumers)"| J
FM -->|"commit tx (source of truth)"| JR
J --> IX --> OS
J --> NT --> HK
J --> MT --> PG
J --> AU --> PG
J --> GC --> OB
All bus consumers are idempotent (dedup on event id), per-aggregate ordered, with a DLQ for poison messages. Derived stores (search index, meters, notifications) are rebuildable by replay (R7/I6). The change journal is written in the commit transaction (source of truth, ADR-0008) — it is not a bus projection.
8. Deployment topology (Kubernetes)
flowchart TB
classDef ext fill:#e5e7eb,stroke:#6b7280,color:#111827;
classDef k8s fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef state fill:#bbf7d0,stroke:#15803d,color:#111827;
internet(("Internet")):::ext
cdn["CDN (optional)"]:::ext
subgraph cluster["Kubernetes Cluster"]
ing["Ingress / LB<br/>(TLS termination)"]:::k8s
subgraph ns["Namespace: bitvault"]
gwd["Deployment: gateway<br/>(HPA)"]:::k8s
apid["Deployment: bitvaultd<br/>(control-plane, HPA)"]:::k8s
wkr["Deployment: workers<br/>(indexer/notifier/gc, HPA)"]:::k8s
end
subgraph data["Stateful (operators / managed)"]
pg[("PostgreSQL<br/>primary + replica")]:::state
rd[("Redis")]:::state
nats[("NATS JetStream")]:::state
osd[("OpenSearch<br/>(optional)")]:::state
end
end
obj[("Object Storage<br/>MinIO in-cluster or<br/>S3/R2/GCS/Azure")]:::ext
internet --> ing --> gwd --> apid
apid --> pg & rd & nats
wkr --> nats & osd & pg & obj
apid --> obj
internet -. presigned .-> obj
internet --> cdn -. cache .-> obj
- Stateless (gateway, bitvaultd, workers) →
Deployment+HPA+PodDisruptionBudget. - Stateful → operators (CloudNativePG, Redis, NATS, OpenSearch) or managed cloud equivalents; the chart supports either via values.
- Profiles (ADR-0012):
lite(bitvaultd + Postgres + object store),standard(+Redis +NATS),full(+OpenSearch +separate workers). Self-host pickslite/standardvia Compose; SaaS runsfullon K8s. - NetworkPolicies isolate the namespace; only the gateway is internet-exposed.
- Bytes path stays out of the cluster (direct/CDN to object store).
9. Cross-cutting architecture
- Observability (ADR-0013): OTel SDK in every component; trace ID minted at the gateway, propagated through gRPC metadata and NATS message headers, so one upload is one trace spanning REST→gRPC→bus→indexer (NFR-7).
- Config: 12-factor, env/secret-driven, profile-aware; no host coupling.
- Security (ADR-0007/0011/0014): TLS edge, mTLS between services post-split, RLS tenant isolation, KMS envelope encryption, scoped presigned URLs.
- Resilience: timeouts + retries with jitter on every gRPC call, circuit breakers to derived stores, graceful shutdown with connection draining, outbox for guaranteed eventual delivery.
See 08-data-model for the data shapes referenced here and 09-evolution-roadmap for how this topology is reached in phases.