Search & Indexing

Purpose

Search is a derived, disposable anti-corruption layer over the search index. It owns no source-of-truth data — the index is entirely reconstructible by replaying the NodeChanged event stream. Its role is to translate query requests into index queries while enforcing tenant isolation, and to consume events and keep the index fresh.

Data owned

Search owns derived indexes only — not rows in the metadata database:

Index Backend Rebuilt from
Full-text + metadata index OpenSearch NodeChanged event stream
Postgres tsvector index Postgres tsvector columns Same NodeChanged stream

The Postgres-FTS index is the fallback for lite deployments where OpenSearch is not available.

Internal API

Search.* gRPC methods:

Method Description
Search.Query Full-text + filter query; always scoped to tenant_id
Search.Suggest Autocomplete suggestions for file/folder names

Event consumption

The search indexer is an async worker that subscribes to node.* events from NATS JetStream:

flowchart LR
    FM["File & Metadata<br/>(outbox)"]
    J{{"NATS JetStream<br/>node.*"}}
    IX["Search Indexer<br/>(worker)"]
    OS[("OpenSearch")]
    PG[("Postgres tsvector")]

    FM -->|NodeChanged| J
    J -->|at-least-once| IX
    IX -->|upsert doc| OS
    IX -->|update tsvector| PG

Tenant isolation

Every index document carries tenant_id as a field. Every query issued by the Search module adds a mandatory tenant_id filter at the index level — this is enforced inside the module, not left to callers. A misconfigured query that omits the filter is rejected before reaching the index.

When OpenSearch is not deployed (lite tier, development), the Search module routes queries to the Postgres tsvector index. The fallback covers name and basic metadata search; advanced ranking and aggregations require OpenSearch. Switching tiers is a configuration change — the Search.Query API surface is identical.

Index rebuild

:::tip The search index can be fully rebuilt by replaying the change journal. If the index is corrupted, the schema changes (e.g. after an OpenSearch mapping update), or the deployment is migrated to a new cluster, rebuild rather than patch — run the indexer replay job from seq = 0 against the new index, then cut over. :::

Rebuild is idempotent: re-indexing does not create duplicates because documents are upserted by node ID.

Extraction characteristics

Search is an early extraction candidate because: