08 — Data Model (High-Level)

Supporting reference for 05 and 06. This is the conceptual model — enough to validate the boundaries and the correctness invariants, not a migration. No DDL here (no implementation code per the brief); column lists are indicative.

Architecture Freeze V1 (2026-06-12). This model is reconciled with the ADRs so the corpus is internally consistent before coding (review §3). The V1 storage model is whole-object (one content-addressed BLOB per file content — no chunks/packs/placement; ADR-0017/0020 are Deferred). The change journal is the source of truth, written transactionally at commit (ADR-0008) — not an event projection. See §6.

The model encodes these decisions:

  1. Node vs Blob — namespace truth vs bytes (the R2/dual-write firewall).
  2. Content-addressed (BLAKE3), ref-counted, whole-object blobs — per-tenant dedup (ADR-0018) + safe grace/CAS GC (ADR-0019).
  3. tenant_id on every owned row — the RLS isolation boundary, set per transaction via SET LOCAL (R4 / ADR-0007, ADR-0038).
  4. Per-tenant envelope encryption — a wrapped DEK per tenant (ADR-0014).

1. Entity-Relationship Diagram

erDiagram
    TENANT ||--o{ USER : "has"
    TENANT ||--|| TENANT_KEY : "sealed-by"
    TENANT ||--o{ NODE : "scopes"
    TENANT ||--o{ BLOB : "scopes"
    TENANT ||--o{ CHANGE : "scopes"
    TENANT ||--|| QUOTA : "has"

    USER ||--o{ MEMBERSHIP : "has"
    USER ||--o{ API_TOKEN : "owns"
    USER ||--o{ DEVICE : "registers"
    USER ||--o{ NODE : "owns"

    NODE ||--o{ NODE : "parent-of"
    NODE ||--o{ VERSION : "has"
    NODE ||--o{ NODE_METADATA : "tagged-by"
    NODE ||--o{ SHARE : "shared-via"
    NODE ||--o{ CHANGE : "journaled-in"

    VERSION }o--|| BLOB : "references (tenant_id, content_hash)"
    BLOB ||--o{ MULTIPART_UPLOAD : "assembled-by"

    SHARE }o--|| USER : "granted-to"
    DEVICE ||--o{ CHANGE : "consumes-via-cursor"

    TENANT {
        uuid id PK
        string name
        string plan
        string status
        timestamptz created_at
    }
    TENANT_KEY {
        uuid tenant_id PK "FK tenants"
        bytea wrapped_dek "DEK sealed by KMS KEK (ADR-0014)"
        int dek_version
        string kek_ref "KMS key id"
        timestamptz rotated_at
    }
    USER {
        uuid id PK
        uuid tenant_id FK
        string email
        string password_hash "argon2id (fallback/self-host)"
        string status
    }
    MEMBERSHIP {
        uuid id PK
        uuid tenant_id FK
        uuid user_id FK
        string role
    }
    API_TOKEN {
        uuid id PK
        uuid tenant_id FK
        uuid user_id FK
        string token_hash "stored hashed, never plaintext"
        timestamptz expires_at
        timestamptz revoked_at
    }
    NODE {
        uuid id PK "stable identity (renames/moves keep id)"
        uuid tenant_id FK
        uuid parent_id FK "structural truth"
        string type "file|folder"
        string name
        string path "denormalized; rewritten on subtree move"
        uuid owner_id FK
        uuid current_version_id FK
        timestamptz deleted_at "soft-delete"
    }
    VERSION {
        uuid id PK
        uuid tenant_id FK
        uuid node_id FK
        string content_hash "FK (tenant_id, content_hash) -> BLOB"
        bigint size
        string mime
        uuid created_by FK
        timestamptz created_at
    }
    BLOB {
        uuid tenant_id PK "composite PK part 1"
        string content_hash PK "composite PK part 2 (BLAKE3)"
        string provider "minio|s3 (single provider in V1)"
        string bucket
        string object_key "{tenant_id}/{content_hash}"
        bigint size
        int refcount "HINT only — not the GC authority (ADR-0019)"
        string state "staging|committed|orphaned|deleting"
        timestamptz orphaned_at "grace-period clock"
        string enc_algo "AES-256-GCM"
        int dek_version "which TENANT_KEY version sealed these bytes"
    }
    MULTIPART_UPLOAD {
        uuid id PK
        uuid tenant_id FK
        string upload_id "provider id"
        string staging_key
        bigint declared_size
        timestamptz expires_at
    }
    NODE_METADATA {
        uuid id PK
        uuid tenant_id FK
        uuid node_id FK
        string key
        string value
    }
    SHARE {
        uuid id PK
        uuid tenant_id FK
        uuid node_id FK
        string kind "grant|link"
        uuid grantee_id FK
        string permission "view|edit|manage"
        string link_token_hash "hash of bearer token; never plaintext (ADR-0037)"
        string password_hash
        timestamptz expires_at
        int max_downloads
    }
    DEVICE {
        uuid id PK
        uuid tenant_id FK
        uuid user_id FK
        string name
        bigint sync_cursor
    }
    CHANGE {
        bigint seq PK "monotonic per tenant; assigned IN the commit tx"
        uuid tenant_id FK
        uuid node_id FK
        string op "create|update|move|delete"
        uuid version_id FK
        uuid actor_id FK
        timestamptz at
    }
    QUOTA {
        uuid tenant_id PK
        bigint bytes_limit
        bigint bytes_used "soft counter; see I-quota note"
    }
    OUTBOX {
        uuid id PK
        uuid tenant_id FK
        string aggregate
        string event_type
        jsonb payload
        timestamptz published_at "null until published"
    }

CHANGE, VERSION, and OUTBOX are written in one transaction at commit (the commit protocol, 06 §4). CHANGE is the authoritative, gap-free, per-tenant ordered journal read by Sync; OUTBOX is the at-least-once publish queue drained to the event bus for derived consumers (search, notifications, meters, audit, GC). They are deliberately distinct: the journal is long-retained and queried by cursor; the outbox is publish-and-clear.


2. Key invariants (these are correctness, test them)

I1 — No committed metadata without verified bytes (defeats R2)

A VERSION may reference a BLOB only after the blob’s bytes are HEAD-verified (size/etag/BLAKE3 hash). The VERSION insert, the BLOB reference (refcount++ / state→committed), the CHANGE(seq) append, and the OUTBOX NodeChanged row are one transaction. If it fails, no version exists; the staging blob is later GC’d.

I2 — Blobs are content-addressed and collected by grace + re-confirmation (defeats data loss)

(tenant_id, content_hash) is the blob identity. Uploading identical bytes within a tenant increments refcount (a hint). Deletion is not authorized by refcount = 0 alone — it requires the blob to be orphaned, the grace period elapsed, and zero references re-confirmed atomically in the transaction that flips it to deleting (ADR-0019). A dedup hit during the grace window flips the blob back to committed. This closes the dedup-vs-delete race; refcount may drift without ever deleting live data (a periodic per-tenant backstop reconciles drift). (Pre-freeze this invariant read “delete when refcount = 0” — the unsafe rule ADR-0019 rejects; corrected here.)

I3 — Tenant isolation is a database invariant (defeats R4)

Every owned row carries tenant_id. Postgres Row-Level Security filters by a transaction-local tenant context set via SET LOCAL app.tenant_id (pooling-safe, ADR-0038); a query with no context set returns no rows (default-deny). A forgotten WHERE tenant_id = ? in app code cannot leak. App-layer scoping is defense-in-depth, not the boundary (ADR-0007).

I4 — The change journal is the source of truth, ordered at commit (enables correct sync)

CHANGE.seq is a per-tenant monotonic counter assigned inside the commit transaction (with the VERSION and OUTBOX rows). It is therefore gap-free and totally ordered per tenant by construction — not derived from event delivery. Devices store a sync_cursor; delta pull is WHERE tenant_id = ? AND seq > cursor ORDER BY seq. This is the spine of correct sync (FR-C / ADR-0008, ADR-0024). A device that commits sees its own change on the next pull (read-your-writes; no projection lag). (Pre-freeze this read “projection of NodeChanged events” — irreconcilable with an at-least-once bus; corrected here.)

I5 — Namespace ops are byte-free (but not free)

Move/rename/copy mutate NODE rows, never BLOB/objects. Copy creates a new NODE/VERSION referencing the same BLOB (refcount++). Node identity is stable (NODE.id), so the sync planner treats a move as O(1) (ADR-0022). On the server, the denormalized path of every descendant is rewritten on a subtree move, so a server move is O(descendants) in path updates — cheap in bytes, not O(1) in rows. (Pre-freeze claimed “cheap by construction / O(1)”; stated honestly here.)

I6 — Derived data is reconstructible; the journal and namespace are not derived

The search index, notification state, and usage meters are projections of the change journal / OUTBOX events and are rebuildable by replay — they are never the source of truth (R7). The CHANGE journal itself and the NODE/VERSION namespace are source of truth (written transactionally at commit), not projections.

I-quota (note, not a hard invariant): QUOTA.bytes_used is a soft counter accrued asynchronously by the Meter consumer, while Billing.CheckQuota is a synchronous upload gate. The gate may therefore read a slightly stale value — V1 accepts bounded over/under-counting during accrual lag; reserve-on-commit tightening is a documented future option.


3. Indexing & hot paths (indicative)


4. What is deliberately not in Postgres


5. Multi-tenancy storage strategy (per ADR-0007 / ADR-0038)


6. V1 freeze reconciliations

Pre-freeze contradiction (review §3) V1 resolution
Two storage models (whole-blob in 08 vs chunk/pack in ADRs 0017/0019/0020) Whole-object blob; chunking/packing/placement Deferred
I2 said “delete at refcount = 0” (unsafe) vs ADR-0019 grace+CAS I2 rewritten to grace + atomic re-confirm; refcount is a hint
Journal both “source of truth” and “event projection” Source of truth, seq assigned in the commit tx (I4)
content_hash "sha-256" vs ADR-0016 BLAKE3 BLAKE3 everywhere
BLOB.content_hash sole PK vs per-tenant dedup (ADR-0018) Composite PK (tenant_id, content_hash)
Encryption (ADR-0014) absent from schema TENANT_KEY + BLOB.enc_algo/dek_version added
SHARE.link_token plaintext vs hashed API tokens link_token_hash (store hash only)
“O(1) / cheap” moves vs materialized path Byte-free, but O(descendants) path rewrite stated honestly