Tenant Isolation
The #1 security property of a multi-tenant platform. Implements ADR-0007. Authentication and authorization provide security, but NOT isolation — a user can be fully authenticated and authorized and still read another tenant’s data unless isolation is deliberately engineered at every layer.
:::danger CI gate
The cross-tenant isolation test must run in CI on every commit. This test forges a mismatched tenant_id in the application layer and asserts that Postgres RLS still blocks the read/write — even when the application layer makes an error. A failed test is a deployment blocker.
:::
1. Multi-Tenancy Model
| Model | How | For |
|---|---|---|
| Pool (default, v1) | Shared database + shared schema + Postgres RLS | High-scale, cost-efficient deployments |
| Bridge | Schema-per-tenant | Noisy or larger tenants |
| Silo | Dedicated database or cluster per tenant | Regulated / enterprise dedicated |
The tenant_id-everywhere design makes escalating a tenant from pool → bridge → silo a routing change, not a rewrite. High-security tenants additionally get their own KMS key so their data is cryptographically isolated even within the pool.
2. Defense-in-Depth Layers
flowchart TB
classDef l fill:#fde68a,stroke:#b45309,color:#111827;
classDef d fill:#bbf7d0,stroke:#15803d,color:#111827;
r["request"]:::l --> L1["① Token: tenant_id from VERIFIED token, resolved BEFORE business logic, re-checked every decision"]:::l
L1 --> L2["② App: request-scoped context; tenant_id in every query / cache key / storage path"]:::l
L2 --> L3["③ DB: Postgres RLS — the real boundary (session var per transaction)"]:::d
L3 --> L4["④ Cache: tenant-prefixed Redis keys"]:::d
L4 --> L5["⑤ Storage: tenant-prefixed object keys + per-tenant KMS keys (crypto isolation)"]:::d
L5 --> L6["⑥ Derived: tenant-scoped search index; per-tenant dedup scope (ADR-0018)"]:::d
| Layer | Isolation mechanism | Attack it stops |
|---|---|---|
| ① Token | tenant_id derived from the verified token before any business logic; re-validated on every authorization decision (never from client input) |
Forged tenant claims; auth-amplification cross-tenant access |
| ② Application | Request-scoped tenant context (never a global or singleton); tenant_id in all queries, cache keys, and storage paths |
Connection-pool identity-swap under await; cache-key collision |
| ③ Database RLS | Postgres Row-Level Security policies filter every row by the session tenant variable; PgBouncer session pooling with mandatory server_reset_query |
Forgotten WHERE tenant_id = in app code; pooled-connection context bleed |
| ④ Cache | Tenant-prefixed Redis keys ({tenant}:prefs:{user}) by construction, not by discipline |
prefs:{user} collisions serving tenant A’s data to tenant B |
| ⑤ Object storage | Tenant-prefixed object keys (/{tenant_id}/{hash}) + per-tenant envelope encryption keys |
Direct object or backup read; rogue DBA (data is ciphertext per tenant) |
| ⑥ Derived stores | Tenant-scoped OpenSearch index + filter; per-tenant deduplication scope | Cross-tenant search hits; dedup existence side-channel |
3. Postgres Row-Level Security
RLS is the keystone of tenant isolation. Policies are defined at the Postgres level and enforce the tenant boundary in the database itself:
-- Example policy on the nodes table
CREATE POLICY nodes_tenant_isolation ON nodes
USING (tenant_id = current_setting('app.current_tenant')::uuid);
A forgotten WHERE tenant_id = ? clause in application code cannot produce a cross-tenant leak — the database refuses to return the row. Application-layer scoping is defense-in-depth, not the boundary.
CI enforces that every tenant-scoped table has an RLS policy. A new table without a policy is flagged as a CI failure.
4. Cache Key Isolation
All Redis keys are tenant-prefixed by a shared cache helper — not by per-call discipline. The pattern is {tenant_id}:{category}:{key}. A bare user_id is never used as a cache key because user IDs are not globally unique across tenants. Constructing a cache key without the helper is a lint error.
5. Connection Pool Isolation
PgBouncer is configured in session pooling mode with a mandatory server_reset_query that clears the Postgres session variable on connection return. This prevents a prior tenant’s context from bleeding into the next request that picks up the same connection.
Within the application, tenant context lives in a request-scoped context object. It is never written to a package-level variable or a singleton, and never mutated after being set at the gateway boundary.
6. Object Storage Isolation
Object keys follow the pattern /{tenant_id}/{content_hash}. This means:
- Tenant data is addressable independently at the storage layer.
- An ACL or bucket policy can restrict a storage path to a single tenant.
- Escalating an enterprise tenant to a dedicated bucket is a routing change, not a data migration.
7. Encryption Isolation
Per-tenant Data Encryption Keys (DEKs) wrapped by per-tenant Key Encryption Keys (KEKs) mean that different tenants’ blobs are encrypted under different keys. Compromise of one tenant’s DEK does not expose any other tenant’s data. See Encryption for the full key hierarchy.
Crypto-shredding on offboarding: destroying a tenant’s KEK makes all their ciphertext — live, replicated, and in backups — permanently unrecoverable. This is how GDPR right-to-erasure is implemented without hunting every replica.
8. High-Severity Pitfalls
- Connection-pool contamination: storing
tenant_idin a global variable and writing it during anawaitcauses a concurrent request to adopt it. Fix: request-scoped context only; mandatoryserver_reset_query. - Cache-key collision: un-prefixed keys serve tenant A’s data to tenant B. Fix: tenant-prefix by construction, enforced by a helper, not by convention.
- Sensitive values in URL query strings: leak into access logs and
Refererheaders (CWE-532). Fix: resource identifiers and tokens in request body or headers, never the query string. - Error/log leakage: a stack trace or error message containing “object 123 belongs to tenant X” leaks cross-tenant information. Fix: generic errors to clients; tenant context in internal structured logs only.
- SQLi overriding
tenant_id: parameterized queries at every call site; RLS backstop that cannot be bypassed by SQL injection.
9. Availability Isolation
Confidentiality is not the only isolation property. Per-tenant rate limits and quotas prevent one tenant from exhausting shared CPU, storage, egress, or IOPS and impacting others. See Rate Limiting.