05 — Service Boundaries & Data Ownership
Covers task 7. Maps bounded contexts (04) to modules (v1, inside the
bitvaultdmodular monolith) that are designed to be extracted into services later (09).Read the column “v1 form” carefully: almost everything starts as an in-process module. The service decomposition is the target, reached by extraction with a forcing function — never big-bang (ADR-0001).
Architecture Freeze V1 (2026-06-12): ownership updated so the change journal is owned and written by File & Metadata in the commit transaction (source of truth); Sync reads it and owns only cursors + conflicts (review §3.3, ADR-0008). Search is Postgres-FTS in V1 (OpenSearch deferred, ADR-0009); the event bus is in-process in V1 (NATS at P3, ADR-0006).
1. The golden rules of ownership
- One owner per piece of data. A module/service that owns a table is the only one that writes it. Others read via its API or via events — never by reaching into its tables. This is the rule that makes extraction possible later.
- Postgres is the source of truth; derived stores are disposable. Search, notifications, usage counters, and previews are rebuildable from events.
- Cross-boundary writes go through APIs (sync) or events (async). No shared mutable tables across owners. No distributed transactions (ADR-0006).
- The data plane (bytes) never flows through compute. Presigned URLs only (R5).
2. Services / Modules
| Logical service | Owns (authoritative data) | Internal API (gRPC) | Sync deps | Async (events) | v1 form |
|---|---|---|---|---|---|
| API Gateway / BFF | nothing (stateless) | — (calls others) | all modules | — | Module: HTTP server + router in bitvaultd |
| Identity | tenants, users, memberships, roles, api_tokens, sessions |
Identity.* (authn, authz, token introspection) |
Postgres | emits UserCreated, TenantSuspended |
Module |
| File & Metadata | nodes, versions, node_metadata, tags, trash, change_journal, outbox |
Files.* (create/commit/move/list/version/trash) |
Identity, Storage, Postgres | writes change_journal(seq) + emits NodeChanged via outbox — one commit tx |
Module |
| Storage | blobs, multipart_uploads, provider config |
Storage.* (presign, head, commit-blob, delete, GC) |
object store | emits BlobCommitted, BlobOrphaned |
Module + worker (GC/finalizer) |
| Sync | device_cursors, conflict_records |
Sync.* (register device, pull deltas, push) |
Files (reads journal), Storage | — (reads the journal; no event projection) | Module (first extraction candidate) |
| Sharing | shares, share_links, permissions |
Sharing.* (grant, link, resolve-access) |
Identity, Files | emits ShareCreated |
Module |
| Search & Indexing | derived FTS index (Postgres-FTS in V1; OpenSearch deferred P3) | Search.* (query) |
Postgres (PG-FTS) | consumes node/share events → index | Module + worker (early extraction candidate) |
| Notification & Events | subscriptions, webhook_endpoints, notifications, delivery state |
Notify.* (subscribe, deliver) |
Redis, SMTP | consumes domain events → fan-out | Module + worker |
| Billing & Metering | usage_meters, quotas, plans |
Billing.* (check-quota, record-usage) |
Postgres | consumes usage events | Module |
| Admin & Platform | feature_flags, config, audit_log (append-only) |
Admin.* |
Postgres | consumes all events → audit | Module |
Workers are the async halves (GC, indexing, notification delivery, preview generation). In v1 they run as goroutine pools inside
bitvaultddriven by the in-process event bus / outbox; they are the first things to become standalone deployments because their scaling profile (bursty, CPU-bound, retry-heavy) differs sharply from request-serving.
3. The three planes
A useful lens orthogonal to the service list:
- Control plane (strongly consistent, synchronous): Gateway, Identity, File & Metadata, Sharing, Storage-presign, and Sync delta-serving (reads the authoritative journal). Handles “what is true” — namespace mutations, authz, issuing transfer URLs, serving cursor deltas. Scales with read replicas + stateless replicas.
- Data plane (high-throughput, bypasses compute): client ⇄ object store via presigned URLs. Scales with the object store, independently of the control plane (R5). BitVault touches only metadata about transfers, never the bytes.
- Async / derivation plane (eventually consistent): Search indexing, Notifications, Billing meters, Audit, Previews, GC — all driven by the event backbone (outbox → in-process bus; NATS at P3). Scales per-worker; failures here never block the control plane. (The Sync journal is not here — it is written transactionally at commit, ADR-0008.)
flowchart LR
subgraph CP["Control plane (strong consistency)"]
GW[API Gateway]
ID[Identity]
FM[File & Metadata]
SH[Sharing]
ST[Storage: presign/commit]
end
subgraph DP["Data plane (bypasses compute)"]
OBJ[(Object Store)]
end
subgraph AP["Async / derivation plane (eventual)"]
SE[Search indexer]
NO[Notifier]
BI[Meter]
AU[Audit]
GC[GC / finalizer]
end
JRNL[(change journal<br/>source of truth)]
SY[Sync: delta serve]
Client -->|REST| GW
GW --> ID & FM & SH & ST & SY
Client -. presigned PUT/GET .-> OBJ
FM -->|writes journal + outbox in one commit tx| JRNL
SY -->|reads seq gt cursor| JRNL
FM -->|outbox| BUS{{in-proc bus / NATS at P3}}
ST -->|outbox| BUS
BUS --> SE & NO & BI & AU & GC
GC --> OBJ
4. Service boundary detail
API Gateway / BFF
- Why a boundary: the single external edge — terminates REST, authenticates, rate-limits per tenant, translates REST↔gRPC, aggregates for the web/mobile BFF.
- Stateless; the easiest thing to scale; the place cross-cutting concerns live.
- Does not own domain logic — it orchestrates calls to modules.
Identity & Access
- Why a boundary: security kernel. Small, auditable, reused by all. Token introspection and authz decisions must be cheap and centralized.
- Authz model: tenant-scoped RBAC + resource grants resolved with Sharing.
- Hot path → cache decisions in Redis with short TTL.
File & Metadata (the spine)
- Why a boundary: the source of truth for the namespace. Owns the commit
protocol that defeats dual-write (R2). The outbox and the change journal live
here: a node change, its
CHANGE(seq)journal row, and itsOUTBOXevent are one transaction (ADR-0008). File owns and writes the journal; Sync reads it. - Highest-consistency requirements; the thing other contexts build on.
Storage
- Why a boundary: isolates the multi-cloud abstraction (R3) and the byte lifecycle. Issues presigned URLs (R5), runs the finalizer/GC worker, manages multipart and ref-counted blobs. Swappable adapters behind one interface (ADR-0005).
Sync (first extraction target)
- Why a boundary: distinct consistency model (causal/cursor), distinct scaling
(long-poll/stream connections), and the clearest portfolio narrative. Reads the
per-tenant change journal (written at commit by File, ADR-0008) to serve cursor
deltas; owns
device_cursors+conflict_records.
Sharing
- Why a boundary: access resolution is security-sensitive and crosses Identity
- File; isolating it keeps the authz story coherent and testable.
Search & Indexing (early extraction target)
- Why a boundary: derived, disposable, CPU/IO-bursty indexing; an anti-corruption layer over the index. V1 uses Postgres-FTS (name/metadata); OpenSearch (content search) is the Deferred P3 escalation behind the same query API (ADR-0009).
Notification & Events
- Why a boundary: fan-out, retries, external delivery (webhooks/email) with their own failure/retry semantics that must not contaminate the core.
Billing & Metering
- Why a boundary: generic, swappable, consumes events; quota checks are sync (gate uploads), usage accrual is async.
Admin & Platform
- Why a boundary: cross-cutting config/flags + the append-only audit sink.
5. Extraction forcing-functions (when a module becomes a service)
Per ADR-0001, extraction is evidence-driven. A module graduates to a service only when one of these is demonstrated:
| Trigger | Likely first service |
|---|---|
| Async workload starves request latency (GC’s CPU, indexing bursts) | Storage worker, Search indexer |
| A component needs an independent scaling profile (many long-lived sync connections) | Sync |
| A component needs a different datastore lifecycle / can be optional | Search |
| Independent deploy cadence / blast-radius isolation needed | the noisiest module |
| A team takes ownership | that team’s context |
What is deliberately not a trigger: “microservices look good.” The whole point (see 01 §3.1) is that disciplined, justified extraction is the portfolio story.
6. Anti-patterns this boundary design forbids
- ❌ Search reading
nodesdirectly from Postgres (must consume events). - ❌ Two modules writing the same table (one owner only).
- ❌ Synchronous chains across 3+ modules in a request path (latency + coupling).
- ❌ Distributed transactions / 2PC across modules (use outbox + sagas).
- ❌ Bytes flowing through Gateway/File/Storage compute (presigned only).
- ❌ A module importing another module’s internal packages instead of its API.