05 — Tenant Isolation

The #1 security property of a multi-tenant platform — and the most-misunderstood. Builds on ADR-0007 (RLS). Closes the cross-tenant branch of the threat model.


1. The lesson that defines this document

Authentication and authorization provide security, but NOT isolation. A user can be fully authenticated and authorized and still read another tenant’s data unless isolation is deliberately engineered. A tenant_id column is logical separation, not isolation — one app bug, one leaked credential, or one rogue DBA exposes every tenant. (AWS SaaS guidance; OWASP Multi-Tenant Cheat Sheet.)

So isolation is a separate, explicit, defense-in-depth property — enforced at every layer, because any single layer will eventually have a bug.


2. Defense-in-depth layers

flowchart TB
    classDef l fill:#fde68a,stroke:#b45309,color:#111827;
    classDef d fill:#bbf7d0,stroke:#15803d,color:#111827;
    r["request"]:::l --> L1["① Token: tenant_id from VERIFIED token, resolved BEFORE business logic, re-checked every decision"]:::l
    L1 --> L2["② App: request-scoped context; tenant_id in every query/cache key/storage path"]:::l
    L2 --> L3["③ DB: Postgres RLS = the boundary (session var per txn)"]:::d
    L3 --> L4["④ Cache: tenant-prefixed Redis keys"]:::d
    L4 --> L5["⑤ Storage: tenant-prefixed object keys + per-tenant KMS keys (crypto isolation)"]:::d
    L5 --> L6["⑥ Derived: tenant-scoped search index; per-tenant dedup (ADR-0018)"]:::d
Layer Control The attack it stops
① Token tenant context from the verified token, resolved before any business logic, re-validated on every authz decision (not just login) forged/auth-amplification cross-tenant
② App request-scoped tenant context (never a global/singleton); tenant_id in all queries, cache keys, storage paths connection-pool identity-swap under await; cache-key collision
③ DB Postgres RLS filters every row by the session tenant — the real boundary; PgBouncer session pooling + mandatory server_reset_query a forgotten WHERE tenant_id in app code; pooled-connection context bleed
④ Cache tenant-prefixed keys ({tenant}:prefs:{user}) prefs:{user} collisions across tenants
⑤ Storage tenant-prefixed object keys + per-tenant envelope keys (10) direct object/backup read; rogue DBA (data is encrypted per tenant)
⑥ Derived tenant-scoped OpenSearch index; per-tenant dedup (ADR-0018) cross-tenant search hit; dedup existence side-channel

RLS is the keystone: even if an app-layer authz check is forgotten, the database itself refuses to return another tenant’s rows. App-layer scoping is defense in depth, not the boundary.


3. The subtle, high-severity pitfalls (call them out explicitly)


4. Isolation models & escalation (match isolation to risk)

Model How For
Pool (default) shared DB + RLS high-scale, cost-efficient (ADR-0007)
Bridge schema-per-tenant noisy/larger tenants
Silo DB / cluster per tenant regulated / enterprise dedicated

The tenant_id-everywhere design makes escalating a tenant from pool→bridge→silo a routing change, not a rewrite. High-security tenants additionally get their own KMS key (10) so their data is cryptographically isolated even in the pool.


5. Availability isolation (noisy neighbor)

Confidentiality isn’t the only isolation: per-tenant rate limits + quotas stop one tenant exhausting shared resources (09).


6. Verification & monitoring (isolation must be tested, continuously)


7. Tenant offboarding (isolation includes deletion)

Complete data deletion on offboarding: drop the tenant’s rows (RLS-scoped), delete its object-key prefix, and — decisively — destroy its KMS key so any residual ciphertext/backups become unrecoverable (crypto-shredding, 10). This also satisfies GDPR erasure (11).

References