08 — Data Model (High-Level)
Supporting reference for 05 and 06. This is the conceptual model — enough to validate the boundaries and the correctness invariants, not a migration. No DDL here (no implementation code per the brief); column lists are indicative.
Architecture Freeze V1 (2026-06-12). This model is reconciled with the ADRs so the corpus is internally consistent before coding (review §3). The V1 storage model is whole-object (one content-addressed
BLOBper file content — no chunks/packs/placement; ADR-0017/0020 are Deferred). The change journal is the source of truth, written transactionally at commit (ADR-0008) — not an event projection. See §6.
The model encodes these decisions:
- Node vs Blob — namespace truth vs bytes (the R2/dual-write firewall).
- Content-addressed (BLAKE3), ref-counted, whole-object blobs — per-tenant dedup (ADR-0018) + safe grace/CAS GC (ADR-0019).
tenant_idon every owned row — the RLS isolation boundary, set per transaction viaSET LOCAL(R4 / ADR-0007, ADR-0038).- Per-tenant envelope encryption — a wrapped DEK per tenant (ADR-0014).
1. Entity-Relationship Diagram
erDiagram
TENANT ||--o{ USER : "has"
TENANT ||--|| TENANT_KEY : "sealed-by"
TENANT ||--o{ NODE : "scopes"
TENANT ||--o{ BLOB : "scopes"
TENANT ||--o{ CHANGE : "scopes"
TENANT ||--|| QUOTA : "has"
USER ||--o{ MEMBERSHIP : "has"
USER ||--o{ API_TOKEN : "owns"
USER ||--o{ DEVICE : "registers"
USER ||--o{ NODE : "owns"
NODE ||--o{ NODE : "parent-of"
NODE ||--o{ VERSION : "has"
NODE ||--o{ NODE_METADATA : "tagged-by"
NODE ||--o{ SHARE : "shared-via"
NODE ||--o{ CHANGE : "journaled-in"
VERSION }o--|| BLOB : "references (tenant_id, content_hash)"
BLOB ||--o{ MULTIPART_UPLOAD : "assembled-by"
SHARE }o--|| USER : "granted-to"
DEVICE ||--o{ CHANGE : "consumes-via-cursor"
TENANT {
uuid id PK
string name
string plan
string status
timestamptz created_at
}
TENANT_KEY {
uuid tenant_id PK "FK tenants"
bytea wrapped_dek "DEK sealed by KMS KEK (ADR-0014)"
int dek_version
string kek_ref "KMS key id"
timestamptz rotated_at
}
USER {
uuid id PK
uuid tenant_id FK
string email
string password_hash "argon2id (fallback/self-host)"
string status
}
MEMBERSHIP {
uuid id PK
uuid tenant_id FK
uuid user_id FK
string role
}
API_TOKEN {
uuid id PK
uuid tenant_id FK
uuid user_id FK
string token_hash "stored hashed, never plaintext"
timestamptz expires_at
timestamptz revoked_at
}
NODE {
uuid id PK "stable identity (renames/moves keep id)"
uuid tenant_id FK
uuid parent_id FK "structural truth"
string type "file|folder"
string name
string path "denormalized; rewritten on subtree move"
uuid owner_id FK
uuid current_version_id FK
timestamptz deleted_at "soft-delete"
}
VERSION {
uuid id PK
uuid tenant_id FK
uuid node_id FK
string content_hash "FK (tenant_id, content_hash) -> BLOB"
bigint size
string mime
uuid created_by FK
timestamptz created_at
}
BLOB {
uuid tenant_id PK "composite PK part 1"
string content_hash PK "composite PK part 2 (BLAKE3)"
string provider "minio|s3 (single provider in V1)"
string bucket
string object_key "{tenant_id}/{content_hash}"
bigint size
int refcount "HINT only — not the GC authority (ADR-0019)"
string state "staging|committed|orphaned|deleting"
timestamptz orphaned_at "grace-period clock"
string enc_algo "AES-256-GCM"
int dek_version "which TENANT_KEY version sealed these bytes"
}
MULTIPART_UPLOAD {
uuid id PK
uuid tenant_id FK
string upload_id "provider id"
string staging_key
bigint declared_size
timestamptz expires_at
}
NODE_METADATA {
uuid id PK
uuid tenant_id FK
uuid node_id FK
string key
string value
}
SHARE {
uuid id PK
uuid tenant_id FK
uuid node_id FK
string kind "grant|link"
uuid grantee_id FK
string permission "view|edit|manage"
string link_token_hash "hash of bearer token; never plaintext (ADR-0037)"
string password_hash
timestamptz expires_at
int max_downloads
}
DEVICE {
uuid id PK
uuid tenant_id FK
uuid user_id FK
string name
bigint sync_cursor
}
CHANGE {
bigint seq PK "monotonic per tenant; assigned IN the commit tx"
uuid tenant_id FK
uuid node_id FK
string op "create|update|move|delete"
uuid version_id FK
uuid actor_id FK
timestamptz at
}
QUOTA {
uuid tenant_id PK
bigint bytes_limit
bigint bytes_used "soft counter; see I-quota note"
}
OUTBOX {
uuid id PK
uuid tenant_id FK
string aggregate
string event_type
jsonb payload
timestamptz published_at "null until published"
}
CHANGE,VERSION, andOUTBOXare written in one transaction at commit (the commit protocol, 06 §4).CHANGEis the authoritative, gap-free, per-tenant ordered journal read by Sync;OUTBOXis the at-least-once publish queue drained to the event bus for derived consumers (search, notifications, meters, audit, GC). They are deliberately distinct: the journal is long-retained and queried by cursor; the outbox is publish-and-clear.
2. Key invariants (these are correctness, test them)
I1 — No committed metadata without verified bytes (defeats R2)
A VERSION may reference a BLOB only after the blob’s bytes are HEAD-verified
(size/etag/BLAKE3 hash). The VERSION insert, the BLOB reference (refcount++ /
state→committed), the CHANGE(seq) append, and the OUTBOX NodeChanged row are
one transaction. If it fails, no version exists; the staging blob is later GC’d.
I2 — Blobs are content-addressed and collected by grace + re-confirmation (defeats data loss)
(tenant_id, content_hash) is the blob identity. Uploading identical bytes within a
tenant increments refcount (a hint). Deletion is not authorized by refcount = 0
alone — it requires the blob to be orphaned, the grace period elapsed, and
zero references re-confirmed atomically in the transaction that flips it to
deleting (ADR-0019). A dedup hit during the grace
window flips the blob back to committed. This closes the dedup-vs-delete race; refcount
may drift without ever deleting live data (a periodic per-tenant backstop reconciles
drift). (Pre-freeze this invariant read “delete when refcount = 0” — the unsafe rule
ADR-0019 rejects; corrected here.)
I3 — Tenant isolation is a database invariant (defeats R4)
Every owned row carries tenant_id. Postgres Row-Level Security filters by a
transaction-local tenant context set via SET LOCAL app.tenant_id (pooling-safe,
ADR-0038); a query with no context set returns
no rows (default-deny). A forgotten WHERE tenant_id = ? in app code cannot leak.
App-layer scoping is defense-in-depth, not the boundary (ADR-0007).
I4 — The change journal is the source of truth, ordered at commit (enables correct sync)
CHANGE.seq is a per-tenant monotonic counter assigned inside the commit transaction
(with the VERSION and OUTBOX rows). It is therefore gap-free and totally ordered
per tenant by construction — not derived from event delivery. Devices store a
sync_cursor; delta pull is WHERE tenant_id = ? AND seq > cursor ORDER BY seq. This is
the spine of correct sync (FR-C / ADR-0008,
ADR-0024). A device that commits sees its own change
on the next pull (read-your-writes; no projection lag). (Pre-freeze this read “projection
of NodeChanged events” — irreconcilable with an at-least-once bus; corrected here.)
I5 — Namespace ops are byte-free (but not free)
Move/rename/copy mutate NODE rows, never BLOB/objects. Copy creates a new
NODE/VERSION referencing the same BLOB (refcount++). Node identity is stable
(NODE.id), so the sync planner treats a move as O(1) (ADR-0022).
On the server, the denormalized path of every descendant is rewritten on a subtree move,
so a server move is O(descendants) in path updates — cheap in bytes, not O(1) in
rows. (Pre-freeze claimed “cheap by construction / O(1)”; stated honestly here.)
I6 — Derived data is reconstructible; the journal and namespace are not derived
The search index, notification state, and usage meters are projections of the
change journal / OUTBOX events and are rebuildable by replay — they are never the
source of truth (R7). The CHANGE journal itself and the NODE/VERSION namespace
are source of truth (written transactionally at commit), not projections.
I-quota (note, not a hard invariant):
QUOTA.bytes_usedis a soft counter accrued asynchronously by the Meter consumer, whileBilling.CheckQuotais a synchronous upload gate. The gate may therefore read a slightly stale value — V1 accepts bounded over/under-counting during accrual lag; reserve-on-commit tightening is a documented future option.
3. Indexing & hot paths (indicative)
NODE(tenant_id, parent_id)— folder listing (the most common read).NODE(tenant_id, path)— path lookup / subtree move.CHANGE(tenant_id, seq)— delta pull (covering).BLOB(tenant_id, content_hash)PK — dedup lookup (per-tenant).BLOB(tenant_id, state, orphaned_at)— GC candidate sweep.SHARE(link_token_hash)— public link resolution by token hash.OUTBOX(published_at) WHERE published_at IS NULL— partial index for the drainer (drained withFOR UPDATE SKIP LOCKED, partitioned by aggregate for ordering).
4. What is deliberately not in Postgres
- Bytes → object storage (BLOB points to provider/bucket/key), encrypted under the tenant DEK.
- Full-text content → Postgres-FTS in V1; OpenSearch is the Deferred P3 escalation (derived; ADR-0009).
- Hot ephemeral state (sessions, rate-limit counters, locks, authz cache) → Redis (not authoritative; reconstructible).
- Event transport in flight → in-process bus in V1; NATS JetStream at P3. The durable
Postgres source is the
OUTBOX+CHANGEjournal.
5. Multi-tenancy storage strategy (per ADR-0007 / ADR-0038)
- v1: shared database, shared schema,
tenant_id+ RLS, tenant context set per transaction viaSET LOCAL(pooling-safe). - Escalation path (documented, not built): noisy/enterprise tenants → schema-
per-tenant → database-per-tenant. The
tenant_id-everywhere model makes this a routing change, not a rewrite. - Object keys are tenant-prefixed (
{tenant_id}/{content_hash}) so storage-side isolation and per-tenant lifecycle/quotas are possible regardless of DB strategy.
6. V1 freeze reconciliations
| Pre-freeze contradiction (review §3) | V1 resolution |
|---|---|
| Two storage models (whole-blob in 08 vs chunk/pack in ADRs 0017/0019/0020) | Whole-object blob; chunking/packing/placement Deferred |
| I2 said “delete at refcount = 0” (unsafe) vs ADR-0019 grace+CAS | I2 rewritten to grace + atomic re-confirm; refcount is a hint |
| Journal both “source of truth” and “event projection” | Source of truth, seq assigned in the commit tx (I4) |
content_hash "sha-256" vs ADR-0016 BLAKE3 |
BLAKE3 everywhere |
BLOB.content_hash sole PK vs per-tenant dedup (ADR-0018) |
Composite PK (tenant_id, content_hash) |
| Encryption (ADR-0014) absent from schema | TENANT_KEY + BLOB.enc_algo/dek_version added |
SHARE.link_token plaintext vs hashed API tokens |
link_token_hash (store hash only) |
| “O(1) / cheap” moves vs materialized path | Byte-free, but O(descendants) path rewrite stated honestly |