05 — Tenant Isolation

The #1 security property of a multi-tenant platform — and the most-misunderstood. Builds on ADR-0007 (RLS). Closes the cross-tenant branch of the threat model.

1. The lesson that defines this document

Authentication and authorization provide security, but NOT isolation. A user can be fully authenticated and authorized and still read another tenant’s data unless isolation is deliberately engineered. A tenant_id column is logical separation, not isolation — one app bug, one leaked credential, or one rogue DBA exposes every tenant. (AWS SaaS guidance; OWASP Multi-Tenant Cheat Sheet.)

So isolation is a separate, explicit, defense-in-depth property — enforced at every layer, because any single layer will eventually have a bug.

2. Defense-in-depth layers

flowchart TB
    classDef l fill:#fde68a,stroke:#b45309,color:#111827;
    classDef d fill:#bbf7d0,stroke:#15803d,color:#111827;
    r["request"]:::l --> L1["① Token: tenant_id from VERIFIED token, resolved BEFORE business logic, re-checked every decision"]:::l
    L1 --> L2["② App: request-scoped context; tenant_id in every query/cache key/storage path"]:::l
    L2 --> L3["③ DB: Postgres RLS = the boundary (session var per txn)"]:::d
    L3 --> L4["④ Cache: tenant-prefixed Redis keys"]:::d
    L4 --> L5["⑤ Storage: tenant-prefixed object keys + per-tenant KMS keys (crypto isolation)"]:::d
    L5 --> L6["⑥ Derived: tenant-scoped search index; per-tenant dedup (ADR-0018)"]:::d

Layer	Control	The attack it stops
① Token	tenant context from the verified token, resolved before any business logic, re-validated on every authz decision (not just login)	forged/auth-amplification cross-tenant
② App	request-scoped tenant context (never a global/singleton); `tenant_id` in all queries, cache keys, storage paths	connection-pool identity-swap under `await`; cache-key collision
③ DB	Postgres RLS filters every row by the session tenant — the real boundary; PgBouncer session pooling + mandatory `server_reset_query`	a forgotten `WHERE tenant_id` in app code; pooled-connection context bleed
④ Cache	tenant-prefixed keys (`{tenant}:prefs:{user}`)	`prefs:{user}` collisions across tenants
⑤ Storage	tenant-prefixed object keys + per-tenant envelope keys (10)	direct object/backup read; rogue DBA (data is encrypted per tenant)
⑥ Derived	tenant-scoped OpenSearch index; per-tenant dedup (ADR-0018)	cross-tenant search hit; dedup existence side-channel

RLS is the keystone: even if an app-layer authz check is forgotten, the database itself refuses to return another tenant’s rows. App-layer scoping is defense in depth, not the boundary.

3. The subtle, high-severity pitfalls (call them out explicitly)

Connection-pool contamination: storing tenant_id in a global/shared variable and writing it during an await lets a concurrent request adopt it → identity swap. Fix: request-scoped context only; pool hygiene (server_reset_query).
Cache-key collision: un-prefixed keys serve tenant A’s data to tenant B. Fix: tenant-prefix every cache key, by construction (a helper, not discipline).
Sensitive values in URLs/GET → leak into access logs/referer (CWE-532). Fix: identifiers in body/headers, never the query string.
Error/log leakage: a stack trace or “object 123 belongs to tenant X” message leaks cross-tenant info. Fix: generic errors to clients; tenant context in logs only.
SQLi overriding tenant_id → parameterized queries + RLS backstop.

4. Isolation models & escalation (match isolation to risk)

Model	How	For
Pool (default)	shared DB + RLS	high-scale, cost-efficient (ADR-0007)
Bridge	schema-per-tenant	noisy/larger tenants
Silo	DB / cluster per tenant	regulated / enterprise dedicated

The tenant_id-everywhere design makes escalating a tenant from pool→bridge→silo a routing change, not a rewrite. High-security tenants additionally get their own KMS key (10) so their data is cryptographically isolated even in the pool.

5. Availability isolation (noisy neighbor)

Confidentiality isn’t the only isolation: per-tenant rate limits + quotas stop one tenant exhausting shared resources (09).

6. Verification & monitoring (isolation must be tested, continuously)

The cross-tenant test (CI gate): forge a mismatched tenant_id in the app layer → assert RLS still blocks the read/write (the I3 invariant, storage/08).
CI check: every tenant-scoped table has an RLS policy (a missing policy on a new table is a hole).
BOLA fuzzing across tenant boundaries; cache-key + connection-context tests.
Monitor & alert on cross-tenant access attempts; log tenant context on every op (07).

7. Tenant offboarding (isolation includes deletion)

Complete data deletion on offboarding: drop the tenant’s rows (RLS-scoped), delete its object-key prefix, and — decisively — destroy its KMS key so any residual ciphertext/backups become unrecoverable (crypto-shredding, 10). This also satisfies GDPR erasure (11).

References

OWASP Multi-Tenant Security Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Multi_Tenant_Security_Cheat_Sheet.html
AWS SaaS tenant isolation: https://docs.aws.amazon.com/whitepapers/latest/saas-architecture-fundamentals/tenant-isolation.html
PostgreSQL Row-Level Security: https://www.postgresql.org/docs/current/ddl-rowsecurity.html