ADR-0038 — RLS tenant context is transaction-local (pooling-safe)
- Status: Accepted
- Date: 2026-06-12 (Architecture Freeze V1)
- Related: [review §5.1], ADR-0007, ADR-0004, 08 §2 I3
V1 Freeze (2026-06-12): Accepted. Blocker-4 resolution. New ADR created to close the highest-confidence security trap in the pre-freeze review: Row-Level Security tenant context interacting unsafely with connection pooling.
Context
ADR-0007 makes Postgres Row-Level Security the tenant-isolation boundary: each request sets a tenant context and RLS policies filter by it. The pre-freeze review (review §5.1) flagged a concrete, catastrophic footgun that ADR-0007 left unresolved: how the tenant context interacts with connection pooling.
- Setting context with a session-level
SET app.tenant_id = …is correct only under session-mode pooling (one server connection per client for its lifetime). Session-mode pooling severely limits throughput. - Under transaction-mode / statement-mode pooling (PgBouncer’s high-throughput
modes), a server connection is handed to a different client between transactions.
A session-level
SETthen leaks tenant A’s context to tenant B’s query — a cross-tenant data breach below the RLS policy that was supposed to prevent it (the R4 outcome, reintroduced at the infrastructure layer).
This must be decided before any multi-tenant code is written, because it dictates the
shape of every database access in platform/db.
Decision
- Tenant context is transaction-local. Every tenant-scoped database access runs
inside a transaction that first executes
SET LOCAL app.tenant_id = '<uuid>'(andSET LOCAL app.user_id,app.role).SET LOCALis scoped to the current transaction and is discarded on commit/rollback, so a pooled connection carries no tenant context between transactions. This is safe under PgBouncer transaction-mode pooling — the high-throughput default we adopt. - RLS policies read the local GUC: policies filter on
current_setting('app.tenant_id', true)::uuid. A query that runs without the GUC set (i.e. outside the wrapper) sees no rows (default-deny), not all rows. - Centralized and unforgettable. Setting the context is owned by a single helper
in
platform/db(e.g.db.WithTenant(ctx, tx, …)); application code cannot open a tenant-scoped query without it. A CI lint forbids raw pool access that bypasses the wrapper, and forbids session-levelSETofapp.*. - Pooling mode is pinned. PgBouncer (or equivalent) runs in transaction mode; session mode is prohibited for the application pool. Prepared-statement handling is configured for transaction-mode compatibility.
- Connection role. The application connects as a non-superuser, non-
BYPASSRLSrole so RLS cannot be silently bypassed; migrations use a separate privileged role.
Consequences
Positive
- The pooled-connection context-bleed breach class is closed by construction; isolation holds at full pooling throughput.
- Default-deny on a missing GUC means a forgotten wrapper fails closed (no rows), not open (all tenants).
- One code path for tenant context → auditable, testable, hard to bypass.
Negative / costs
- Every tenant-scoped read/write must run in a transaction (even single statements) —
a minor uniform overhead, absorbed by the
platform/dbhelper. - Transaction-mode pooling constrains use of session-level features (advisory locks held across transactions, server-side prepared-statement caching) — acceptable; the app does not rely on them on the tenant path.
Alternatives considered
- Session-level
SET+ session-mode pooling: correct but throttles throughput (one backend per client); rejected as the default. SET ROLE <tenant_role>per request with per-tenant DB roles: strong isolation but explodes role management at many tenants; rejected for shared-schema V1 (kept as an option on the schema/DB-per-tenant escalation path, ADR-0007).- App-layer scoping only (no RLS): rejected by ADR-0007 — a single missing
WHEREis a breach.
Verification
Two CI tests guard I3:
- Forged-
tenant_idtest (from ADR-0007): app-layer code sets a mismatched tenant id → RLS still blocks cross-tenant reads/writes. - Pooled-connection bleed test (new): drive two tenants’ requests across a transaction-mode pool small enough to force connection reuse → assert tenant B never observes tenant A’s rows, and a query issued without the wrapper returns zero rows.