Service Boundaries
In v1 the “services” described here are modules inside the
bitvaultdmonolith. The boundaries are seams designed so that extraction to a standalone service is a deployment change, not a rewrite (ADR-0001).
Golden Rules of Ownership
- One owner per piece of data. The module that owns a table is the only one that writes it. Others read via its API or via events — never by reaching into its tables. This rule is what makes extraction possible.
- Postgres is the source of truth; derived stores are disposable. Search indexes, notification state, usage counters, and previews are projections. They are rebuildable from events.
- Cross-boundary writes go through APIs (sync) or events (async). No shared mutable tables across owners. No distributed transactions (ADR-0006).
- The data plane (bytes) never flows through compute. Presigned URLs only (ADR-0011).
Services Overview
| Logical service | Owns (authoritative data) | Internal API (gRPC) | Async events | v1 form |
|---|---|---|---|---|
| API Gateway / BFF | Nothing (stateless) | — (calls others) | — | Module: HTTP server + router in bitvaultd |
| Identity | tenants, users, memberships, roles, api_tokens, sessions |
Identity.* (authn, authz, token introspection) |
Emits UserCreated, TenantSuspended |
Module |
| File & Metadata | nodes, versions, node_metadata, tags, trash |
Files.* (create/commit/move/list/version/trash) |
Emits NodeChanged via outbox |
Module |
| Storage | blobs, multipart_uploads, provider config |
Storage.* (presign, head, commit-blob, delete, GC) |
Emits BlobCommitted, BlobOrphaned |
Module + worker (GC/finalizer) |
| Sync | change_journal, device_cursors, conflicts |
Sync.* (register device, pull deltas, push) |
Consumes NodeChanged → journal |
Module (first extraction candidate) |
| Sharing | shares, share_links, permissions |
Sharing.* (grant, link, resolve-access) |
Emits ShareCreated |
Module |
| Search & Indexing | OpenSearch / Postgres-FTS indexes (derived) | Search.* (query) |
Consumes node/share events → index | Module + worker (early extraction candidate) |
| Notification & Events | subscriptions, webhook_endpoints, notifications, delivery state |
Notify.* (subscribe, deliver) |
Consumes domain events → fan-out | Module + worker |
| Billing & Metering | usage_meters, quotas, plans |
Billing.* (check-quota, record-usage) |
Consumes usage events | Module |
| Admin & Platform | feature_flags, config, audit_log (append-only) |
Admin.* |
Consumes all events → audit | Module |
Workers are the async halves (GC, indexing, notification delivery). In v1 they run as goroutine pools inside bitvaultd driven by the in-process event bus; they are the first things to become standalone deployments because their scaling profile (bursty, CPU-bound, retry-heavy) differs from request-serving.
The Three Planes
flowchart LR
subgraph CP["Control plane (strong consistency)"]
GW[API Gateway]
ID[Identity]
FM[File & Metadata]
SH[Sharing]
ST[Storage: presign/commit]
end
subgraph DP["Data plane (bypasses compute)"]
OBJ[(Object Store)]
end
subgraph AP["Async / derivation plane (eventual)"]
SY[Sync projector]
SE[Search indexer]
NO[Notifier]
BI[Meter]
AU[Audit]
GC[GC / finalizer]
end
Client -->|REST| GW
GW --> ID & FM & SH & ST
Client -. presigned PUT/GET .-> OBJ
FM -->|outbox| BUS{{NATS / in-proc bus}}
ST -->|outbox| BUS
BUS --> SY & SE & NO & BI & AU & GC
GC --> OBJ
| Plane | Consistency | Scales with |
|---|---|---|
| Control | Strong, synchronous | Read replicas + stateless replicas |
| Data | High-throughput, compute-bypass | Object store, independent of control |
| Async / Derivation | Eventual | Per-worker; failures never block control |
See System Overview → The Three Planes for the full table.
Service Detail
API Gateway / BFF
The single external edge. Terminates REST, authenticates every request, enforces per-tenant rate limits, translates REST↔gRPC, and aggregates responses for web and mobile BFF use cases.
- Owns: nothing (stateless).
- Why a boundary: cross-cutting concerns (authn, rate-limiting, protocol translation) must not live inside domain modules. Stateless; easiest unit to scale.
- Does not own domain logic — it orchestrates calls to modules.
- Session tokens verified against Identity; decisions cached in Redis with short TTL.
Identity
Security kernel. All token issuance, introspection, and tenant-scoped RBAC live here.
- Owns:
tenants,users,memberships,roles,api_tokens,sessions. - Why a boundary: small, auditable, reused by every other module. Token introspection and authz must be cheap and centralized.
- Authz model: tenant-scoped RBAC + resource grants resolved with Sharing.
- Hot path: authz decisions cached in Redis with short TTL.
- Emits
UserCreated,TenantSuspendedfor downstream consumers.
File & Metadata
The namespace spine. Source of truth for every file, folder, version, and trash entry.
- Owns:
nodes,versions,node_metadata,tags,trash. - Why a boundary: highest-consistency requirements. Owns the commit protocol that defeats dual-write. The outbox lives here: a node change and its event are one transaction.
- The
VERSIONinsert,BLOB.refcount++, andOUTBOX NodeChangedrow are a single Postgres transaction. - All other contexts build on File & Metadata via its gRPC API or
NodeChangedevents.
See Data Model → Key Invariants I1 for the transactional guarantee.
Storage
Isolates the multi-cloud byte lifecycle. Issues presigned URLs, manages multipart uploads, runs the GC/finalizer worker.
- Owns:
blobs,multipart_uploads, provider config. - Why a boundary: isolates the provider abstraction (ADR-0005) and the byte lifecycle (ADR-0019). Swappable adapters (MinIO / S3 / R2 / GCS / Azure) behind one interface.
- Issues presigned PUT/GET URLs; bytes never traverse this module’s network path (ADR-0011).
- GC worker:
refcount = 0→ eligible → reclaim object storage blob (ADR-0019). - Content hashed with BLAKE3 (ADR-0016); chunked for large files (ADR-0017).
Sync
Projects NodeChanged events into a per-tenant monotonic change journal. Serves delta pulls and handles conflict detection.
- Owns:
change_journal,device_cursors,conflicts. - Why a boundary: distinct consistency model (causal/cursor), distinct scaling profile (many long-lived connections), and the clearest extraction narrative.
- First extraction candidate — long-poll and streaming connections have a sharply different scaling profile from request-serving.
- Conflict resolution: stale base version → conflicted copy; both histories kept (ADR-0008, ADR-0026).
- Cursor model: monotonic
CHANGE.seqper tenant (ADR-0024).
See System Overview → Sync Flow for the sequence diagram.
Sharing
Access resolution is security-sensitive and crosses Identity + File. Isolating it keeps the authz story coherent and independently testable.
- Owns:
shares,share_links,permissions. - Why a boundary: access decisions must not be scattered across modules. Public share links require independent security analysis (ADR-0037).
- Emits
ShareCreatedfor Notification consumers. - Sharing checks gate every download; authz precedes URL issuance.
Search & Indexing
Derived, disposable, CPU/IO-bursty. Anti-corruption layer over the search index.
- Owns: OpenSearch / Postgres-FTS derived indexes (not authoritative).
- Why a boundary: different datastore (OpenSearch), bursty indexing CPU, and can be disabled entirely with Postgres-FTS fallback (ADR-0009).
- Early extraction candidate — indexing bursts must not affect request latency.
- Consumes
node.*andshare.*events from NATS; index is rebuildable from the journal (Invariant I6).
Notification & Events
Fan-out, retries, external delivery with failure semantics that must not contaminate the control plane.
- Owns:
subscriptions,webhook_endpoints,notifications, delivery state. - Why a boundary: external delivery (webhooks/email/push) has its own retry/backoff semantics; failures are acceptable in a way that control-plane failures are not.
- Consumes domain events; delivers via signed webhook payloads and SMTP.
Billing & Metering
Generic, swappable, event-driven usage accounting.
- Owns:
usage_meters,quotas,plans. - Why a boundary: quota checks are sync (gate uploads), usage accrual is async (consume events). Different consistency requirement per operation.
- Quota check: synchronous gRPC call from Storage before issuing presigned URL.
- Usage accrual: async consumer of
BlobCommitted,NodeChangedevents.
Admin & Platform
Cross-cutting config, feature flags, and the append-only audit sink.
- Owns:
feature_flags,config,audit_log(append-only). - Why a boundary: cross-cutting config and the audit trail must not be entangled with domain modules. Audit sink consumes all events.
- Feature flags gate behaviour at runtime without deploys.
- Audit log is append-only; never updated in place.
Extraction Forcing Functions
Per ADR-0001, extraction is evidence-driven. A module graduates to a service only when one of these is demonstrated:
| Trigger | Likely first service |
|---|---|
| Async workload starves request latency (GC’s CPU, indexing bursts) | Storage worker, Search indexer |
| A component needs an independent scaling profile (many long-lived sync connections) | Sync |
| A component needs a different datastore lifecycle or can be optional | Search |
| Independent deploy cadence / blast-radius isolation is needed | The noisiest module at that time |
| A team takes ownership of a context | That team’s context |
“Microservices look better” is not a trigger. Disciplined, justified extraction — each with a forcing function and an ADR — is the architecture story.
Anti-patterns
The following are explicitly forbidden by this boundary design:
- ❌ Search reading
nodesdirectly from Postgres. Must consumeNodeChangedevents; the search index is a projection, not a replica. - ❌ Two modules writing the same table. One owner only; the owning module’s API is the only write path.
- ❌ Synchronous chains across 3+ modules in a single request path. Latency compounds and coupling defeats the seam design.
- ❌ Distributed transactions / 2PC across modules. Use the transactional outbox pattern and idempotent consumers (ADR-0006).
- ❌ Bytes flowing through Gateway, File & Metadata, or Storage compute. Presigned URLs only; the data plane bypasses compute by design.
- ❌ A module importing another module’s internal packages. All cross-boundary calls go through the gRPC API or the event bus.
Back: Data Model · System Overview