OpenTelemetry Setup

ADR-0013 mandates OTel from commit #1. The SDK is initialized in internal/platform/observability/ before any domain code runs. All three signals — traces, metrics, and logs — share the same initialization path so resource attributes (service name, version, environment, tenant) are consistent across signals.

Signals

Signal Instrument Purpose
Traces OTel SDK (Go go.opentelemetry.io/otel) One trace per user action; spans through gRPC calls and NATS message headers
Metrics Prometheus exporter via OTel Metrics SDK RED per endpoint, USE for infra, domain gauges for eventual-consistency health
Logs Structured JSON (zerolog/zap) correlated by trace ID No PII in log bodies; trace ID injected into every log line

Context Propagation

The trace ID is minted at the API Gateway on the first inbound HTTP request using the W3C TraceContext standard (traceparent header). From that point it flows without loss:

  1. HTTP → Gatewaytraceparent header extracted; root span created.
  2. Gateway → gRPC services — propagated via grpc-trace-bin / traceparent in gRPC metadata.
  3. gRPC service → NATS — injected into the NATS message header (traceparent) when publishing from the transactional outbox.
  4. NATS → async worker — extracted from the message header; worker creates a child span linked to the producer span.

One upload = one connected trace: Gateway → File & Metadata → Storage → NATS Bus → Search Indexer. The entire async derivation chain is visible in a single waterfall.

flowchart LR
    GW["API Gateway\n(root OTel span)\nHTTP traceparent"]
    FM["File & Metadata\n(child span)\ngRPC metadata"]
    ST["Storage\n(child span)\ngRPC metadata"]
    BUS["NATS JetStream\nNATS header"]
    IX["Search Indexer\n(child span)\nconsumer"]
    COL["OTel Collector"]
    PR["Prometheus"]
    TP["Tempo / Jaeger"]
    LK["Loki"]

    GW -->|gRPC metadata| FM
    FM -->|gRPC metadata| ST
    ST -->|outbox → NATS header| BUS
    BUS -->|NATS header| IX

    GW & FM & ST & IX -->|OTLP| COL
    COL --> PR
    COL --> TP
    COL --> LK

OTel Collector

All signals are exported to an OTel Collector via OTLP (gRPC or HTTP). The Collector is responsible for batching, sampling decisions (tail-based), and routing to backends. Backends are deployment choices:

Deployment tier Common backend stack
Lite (self-host) Structured logs only; basic Prometheus scrape
Standard Prometheus + Jaeger (traces); stdout JSON logs
Full Prometheus + Tempo (traces) + Loki (logs); full OTel pipeline
SaaS / managed Any OTel-compatible vendor (Grafana Cloud, Honeycomb, Datadog)

No backend endpoint is hard-coded in application code. All are supplied via environment variables passed to the Collector sidecar or configured in its pipeline.

Health Endpoints

Every component — bitvaultd, bitvault-worker, and the web BFF — exposes:

Endpoint Semantics
GET /healthz Liveness: process is alive. Returns 200 OK with {"status":"ok"}.
GET /readyz Readiness: all dependencies reachable. Checks Postgres, Redis (if enabled), NATS (if enabled). Returns 200 when ready, 503 with a JSON body listing failing checks otherwise.

Kubernetes liveness and readiness probes target these endpoints directly. The readiness probe failing causes the pod to be removed from service endpoints without killing the process (graceful drain under load).

:::tip Self-Host Lite For the lite deployment tier, a minimal observability setup — structured JSON logs + basic Prometheus metrics scrape — is sufficient to operate the upload/download flow. The full OTel Collector pipeline (Tempo + Loki) is for the full tier and production deployments where async-plane tracing and log correlation are required. :::