Tenant Isolation

The #1 security property of a multi-tenant platform. Implements ADR-0007. Authentication and authorization provide security, but NOT isolation — a user can be fully authenticated and authorized and still read another tenant’s data unless isolation is deliberately engineered at every layer.

:::danger CI gate The cross-tenant isolation test must run in CI on every commit. This test forges a mismatched tenant_id in the application layer and asserts that Postgres RLS still blocks the read/write — even when the application layer makes an error. A failed test is a deployment blocker. :::

1. Multi-Tenancy Model

Model	How	For
Pool (default, v1)	Shared database + shared schema + Postgres RLS	High-scale, cost-efficient deployments
Bridge	Schema-per-tenant	Noisy or larger tenants
Silo	Dedicated database or cluster per tenant	Regulated / enterprise dedicated

The tenant_id-everywhere design makes escalating a tenant from pool → bridge → silo a routing change, not a rewrite. High-security tenants additionally get their own KMS key so their data is cryptographically isolated even within the pool.

2. Defense-in-Depth Layers

flowchart TB
    classDef l fill:#fde68a,stroke:#b45309,color:#111827;
    classDef d fill:#bbf7d0,stroke:#15803d,color:#111827;
    r["request"]:::l --> L1["① Token: tenant_id from VERIFIED token, resolved BEFORE business logic, re-checked every decision"]:::l
    L1 --> L2["② App: request-scoped context; tenant_id in every query / cache key / storage path"]:::l
    L2 --> L3["③ DB: Postgres RLS — the real boundary (session var per transaction)"]:::d
    L3 --> L4["④ Cache: tenant-prefixed Redis keys"]:::d
    L4 --> L5["⑤ Storage: tenant-prefixed object keys + per-tenant KMS keys (crypto isolation)"]:::d
    L5 --> L6["⑥ Derived: tenant-scoped search index; per-tenant dedup scope (ADR-0018)"]:::d

Layer	Isolation mechanism	Attack it stops
① Token	`tenant_id` derived from the verified token before any business logic; re-validated on every authorization decision (never from client input)	Forged tenant claims; auth-amplification cross-tenant access
② Application	Request-scoped tenant context (never a global or singleton); `tenant_id` in all queries, cache keys, and storage paths	Connection-pool identity-swap under `await`; cache-key collision
③ Database RLS	Postgres Row-Level Security policies filter every row by the session tenant variable; PgBouncer session pooling with mandatory `server_reset_query`	Forgotten `WHERE tenant_id =` in app code; pooled-connection context bleed
④ Cache	Tenant-prefixed Redis keys (`{tenant}:prefs:{user}`) by construction, not by discipline	`prefs:{user}` collisions serving tenant A’s data to tenant B
⑤ Object storage	Tenant-prefixed object keys (`/{tenant_id}/{hash}`) + per-tenant envelope encryption keys	Direct object or backup read; rogue DBA (data is ciphertext per tenant)
⑥ Derived stores	Tenant-scoped OpenSearch index + filter; per-tenant deduplication scope	Cross-tenant search hits; dedup existence side-channel

3. Postgres Row-Level Security

RLS is the keystone of tenant isolation. Policies are defined at the Postgres level and enforce the tenant boundary in the database itself:

-- Example policy on the nodes table
CREATE POLICY nodes_tenant_isolation ON nodes
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

A forgotten WHERE tenant_id = ? clause in application code cannot produce a cross-tenant leak — the database refuses to return the row. Application-layer scoping is defense-in-depth, not the boundary.

CI enforces that every tenant-scoped table has an RLS policy. A new table without a policy is flagged as a CI failure.

4. Cache Key Isolation

All Redis keys are tenant-prefixed by a shared cache helper — not by per-call discipline. The pattern is {tenant_id}:{category}:{key}. A bare user_id is never used as a cache key because user IDs are not globally unique across tenants. Constructing a cache key without the helper is a lint error.

5. Connection Pool Isolation

PgBouncer is configured in session pooling mode with a mandatory server_reset_query that clears the Postgres session variable on connection return. This prevents a prior tenant’s context from bleeding into the next request that picks up the same connection.

Within the application, tenant context lives in a request-scoped context object. It is never written to a package-level variable or a singleton, and never mutated after being set at the gateway boundary.

6. Object Storage Isolation

Object keys follow the pattern /{tenant_id}/{content_hash}. This means:

Tenant data is addressable independently at the storage layer.
An ACL or bucket policy can restrict a storage path to a single tenant.
Escalating an enterprise tenant to a dedicated bucket is a routing change, not a data migration.

7. Encryption Isolation

Per-tenant Data Encryption Keys (DEKs) wrapped by per-tenant Key Encryption Keys (KEKs) mean that different tenants’ blobs are encrypted under different keys. Compromise of one tenant’s DEK does not expose any other tenant’s data. See Encryption for the full key hierarchy.

Crypto-shredding on offboarding: destroying a tenant’s KEK makes all their ciphertext — live, replicated, and in backups — permanently unrecoverable. This is how GDPR right-to-erasure is implemented without hunting every replica.

8. High-Severity Pitfalls

Connection-pool contamination: storing tenant_id in a global variable and writing it during an await causes a concurrent request to adopt it. Fix: request-scoped context only; mandatory server_reset_query.
Cache-key collision: un-prefixed keys serve tenant A’s data to tenant B. Fix: tenant-prefix by construction, enforced by a helper, not by convention.
Sensitive values in URL query strings: leak into access logs and Referer headers (CWE-532). Fix: resource identifiers and tokens in request body or headers, never the query string.
Error/log leakage: a stack trace or error message containing “object 123 belongs to tenant X” leaks cross-tenant information. Fix: generic errors to clients; tenant context in internal structured logs only.
SQLi overriding tenant_id: parameterized queries at every call site; RLS backstop that cannot be bypassed by SQL injection.

9. Availability Isolation

Confidentiality is not the only isolation property. Per-tenant rate limits and quotas prevent one tenant from exhausting shared CPU, storage, egress, or IOPS and impacting others. See Rate Limiting.