05 — Tenant Isolation
The #1 security property of a multi-tenant platform — and the most-misunderstood. Builds on ADR-0007 (RLS). Closes the cross-tenant branch of the threat model.
1. The lesson that defines this document
Authentication and authorization provide security, but NOT isolation. A user can be fully authenticated and authorized and still read another tenant’s data unless isolation is deliberately engineered. A
tenant_idcolumn is logical separation, not isolation — one app bug, one leaked credential, or one rogue DBA exposes every tenant. (AWS SaaS guidance; OWASP Multi-Tenant Cheat Sheet.)
So isolation is a separate, explicit, defense-in-depth property — enforced at every layer, because any single layer will eventually have a bug.
2. Defense-in-depth layers
flowchart TB
classDef l fill:#fde68a,stroke:#b45309,color:#111827;
classDef d fill:#bbf7d0,stroke:#15803d,color:#111827;
r["request"]:::l --> L1["① Token: tenant_id from VERIFIED token, resolved BEFORE business logic, re-checked every decision"]:::l
L1 --> L2["② App: request-scoped context; tenant_id in every query/cache key/storage path"]:::l
L2 --> L3["③ DB: Postgres RLS = the boundary (session var per txn)"]:::d
L3 --> L4["④ Cache: tenant-prefixed Redis keys"]:::d
L4 --> L5["⑤ Storage: tenant-prefixed object keys + per-tenant KMS keys (crypto isolation)"]:::d
L5 --> L6["⑥ Derived: tenant-scoped search index; per-tenant dedup (ADR-0018)"]:::d
| Layer | Control | The attack it stops |
|---|---|---|
| ① Token | tenant context from the verified token, resolved before any business logic, re-validated on every authz decision (not just login) | forged/auth-amplification cross-tenant |
| ② App | request-scoped tenant context (never a global/singleton); tenant_id in all queries, cache keys, storage paths |
connection-pool identity-swap under await; cache-key collision |
| ③ DB | Postgres RLS filters every row by the session tenant — the real boundary; PgBouncer session pooling + mandatory server_reset_query |
a forgotten WHERE tenant_id in app code; pooled-connection context bleed |
| ④ Cache | tenant-prefixed keys ({tenant}:prefs:{user}) |
prefs:{user} collisions across tenants |
| ⑤ Storage | tenant-prefixed object keys + per-tenant envelope keys (10) | direct object/backup read; rogue DBA (data is encrypted per tenant) |
| ⑥ Derived | tenant-scoped OpenSearch index; per-tenant dedup (ADR-0018) | cross-tenant search hit; dedup existence side-channel |
RLS is the keystone: even if an app-layer authz check is forgotten, the database itself refuses to return another tenant’s rows. App-layer scoping is defense in depth, not the boundary.
3. The subtle, high-severity pitfalls (call them out explicitly)
- Connection-pool contamination: storing
tenant_idin a global/shared variable and writing it during anawaitlets a concurrent request adopt it → identity swap. Fix: request-scoped context only; pool hygiene (server_reset_query). - Cache-key collision: un-prefixed keys serve tenant A’s data to tenant B. Fix: tenant-prefix every cache key, by construction (a helper, not discipline).
- Sensitive values in URLs/GET → leak into access logs/referer (CWE-532). Fix: identifiers in body/headers, never the query string.
- Error/log leakage: a stack trace or “object 123 belongs to tenant X” message leaks cross-tenant info. Fix: generic errors to clients; tenant context in logs only.
- SQLi overriding
tenant_id→ parameterized queries + RLS backstop.
4. Isolation models & escalation (match isolation to risk)
| Model | How | For |
|---|---|---|
| Pool (default) | shared DB + RLS | high-scale, cost-efficient (ADR-0007) |
| Bridge | schema-per-tenant | noisy/larger tenants |
| Silo | DB / cluster per tenant | regulated / enterprise dedicated |
The tenant_id-everywhere design makes escalating a tenant from pool→bridge→silo a
routing change, not a rewrite. High-security tenants additionally get their own KMS
key (10) so their data is cryptographically isolated even in the
pool.
5. Availability isolation (noisy neighbor)
Confidentiality isn’t the only isolation: per-tenant rate limits + quotas stop one tenant exhausting shared resources (09).
6. Verification & monitoring (isolation must be tested, continuously)
- The cross-tenant test (CI gate): forge a mismatched
tenant_idin the app layer → assert RLS still blocks the read/write (the I3 invariant, storage/08). - CI check: every tenant-scoped table has an RLS policy (a missing policy on a new table is a hole).
- BOLA fuzzing across tenant boundaries; cache-key + connection-context tests.
- Monitor & alert on cross-tenant access attempts; log tenant context on every op (07).
7. Tenant offboarding (isolation includes deletion)
Complete data deletion on offboarding: drop the tenant’s rows (RLS-scoped), delete its object-key prefix, and — decisively — destroy its KMS key so any residual ciphertext/backups become unrecoverable (crypto-shredding, 10). This also satisfies GDPR erasure (11).
References
- OWASP Multi-Tenant Security Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Multi_Tenant_Security_Cheat_Sheet.html
- AWS SaaS tenant isolation: https://docs.aws.amazon.com/whitepapers/latest/saas-architecture-fundamentals/tenant-isolation.html
- PostgreSQL Row-Level Security: https://www.postgresql.org/docs/current/ddl-rowsecurity.html