ADR-0038 — RLS tenant context is transaction-local (pooling-safe)

Status: Accepted
Date: 2026-06-12 (Architecture Freeze V1)
Related: [review §5.1], ADR-0007, ADR-0004, 08 §2 I3

V1 Freeze (2026-06-12): Accepted. Blocker-4 resolution. New ADR created to close the highest-confidence security trap in the pre-freeze review: Row-Level Security tenant context interacting unsafely with connection pooling.

Context

ADR-0007 makes Postgres Row-Level Security the tenant-isolation boundary: each request sets a tenant context and RLS policies filter by it. The pre-freeze review (review §5.1) flagged a concrete, catastrophic footgun that ADR-0007 left unresolved: how the tenant context interacts with connection pooling.

Setting context with a session-level SET app.tenant_id = … is correct only under session-mode pooling (one server connection per client for its lifetime). Session-mode pooling severely limits throughput.
Under transaction-mode / statement-mode pooling (PgBouncer’s high-throughput modes), a server connection is handed to a different client between transactions. A session-level SET then leaks tenant A’s context to tenant B’s query — a cross-tenant data breach below the RLS policy that was supposed to prevent it (the R4 outcome, reintroduced at the infrastructure layer).

This must be decided before any multi-tenant code is written, because it dictates the shape of every database access in platform/db.

Decision

Tenant context is transaction-local. Every tenant-scoped database access runs inside a transaction that first executes SET LOCAL app.tenant_id = '<uuid>' (and SET LOCAL app.user_id, app.role). SET LOCAL is scoped to the current transaction and is discarded on commit/rollback, so a pooled connection carries no tenant context between transactions. This is safe under PgBouncer transaction-mode pooling — the high-throughput default we adopt.
RLS policies read the local GUC: policies filter on current_setting('app.tenant_id', true)::uuid. A query that runs without the GUC set (i.e. outside the wrapper) sees no rows (default-deny), not all rows.
Centralized and unforgettable. Setting the context is owned by a single helper in platform/db (e.g. db.WithTenant(ctx, tx, …)); application code cannot open a tenant-scoped query without it. A CI lint forbids raw pool access that bypasses the wrapper, and forbids session-level SET of app.*.
Pooling mode is pinned. PgBouncer (or equivalent) runs in transaction mode; session mode is prohibited for the application pool. Prepared-statement handling is configured for transaction-mode compatibility.
Connection role. The application connects as a non-superuser, non-BYPASSRLS role so RLS cannot be silently bypassed; migrations use a separate privileged role.

Consequences

Positive

The pooled-connection context-bleed breach class is closed by construction; isolation holds at full pooling throughput.
Default-deny on a missing GUC means a forgotten wrapper fails closed (no rows), not open (all tenants).
One code path for tenant context → auditable, testable, hard to bypass.

Negative / costs

Every tenant-scoped read/write must run in a transaction (even single statements) — a minor uniform overhead, absorbed by the platform/db helper.
Transaction-mode pooling constrains use of session-level features (advisory locks held across transactions, server-side prepared-statement caching) — acceptable; the app does not rely on them on the tenant path.

Alternatives considered

Session-level SET + session-mode pooling: correct but throttles throughput (one backend per client); rejected as the default.
SET ROLE <tenant_role> per request with per-tenant DB roles: strong isolation but explodes role management at many tenants; rejected for shared-schema V1 (kept as an option on the schema/DB-per-tenant escalation path, ADR-0007).
App-layer scoping only (no RLS): rejected by ADR-0007 — a single missing WHERE is a breach.

Verification

Two CI tests guard I3:

Forged-tenant_id test (from ADR-0007): app-layer code sets a mismatched tenant id → RLS still blocks cross-tenant reads/writes.
Pooled-connection bleed test (new): drive two tenants’ requests across a transaction-mode pool small enough to force connection reuse → assert tenant B never observes tenant A’s rows, and a query issued without the wrapper returns zero rows.