Search & Indexing

Purpose

Search is a derived, disposable anti-corruption layer over the search index. It owns no source-of-truth data — the index is entirely reconstructible by replaying the NodeChanged event stream. Its role is to translate query requests into index queries while enforcing tenant isolation, and to consume events and keep the index fresh.

Data owned

Search owns derived indexes only — not rows in the metadata database:

Index	Backend	Rebuilt from
Full-text + metadata index	OpenSearch	`NodeChanged` event stream
Postgres tsvector index	Postgres `tsvector` columns	Same `NodeChanged` stream

The Postgres-FTS index is the fallback for lite deployments where OpenSearch is not available.

Internal API

Search.* gRPC methods:

Method	Description
`Search.Query`	Full-text + filter query; always scoped to `tenant_id`
`Search.Suggest`	Autocomplete suggestions for file/folder names

Event consumption

The search indexer is an async worker that subscribes to node.* events from NATS JetStream:

flowchart LR
    FM["File & Metadata<br/>(outbox)"]
    J{{"NATS JetStream<br/>node.*"}}
    IX["Search Indexer<br/>(worker)"]
    OS[("OpenSearch")]
    PG[("Postgres tsvector")]

    FM -->|NodeChanged| J
    J -->|at-least-once| IX
    IX -->|upsert doc| OS
    IX -->|update tsvector| PG

Consumers are idempotent: re-indexing an already-indexed event is safe (upsert by node ID + version).
NATS JetStream ensures at-least-once delivery with a dead-letter queue for poison messages.
Index lag is observable: the difference between the latest CHANGE.seq and the last indexed seq is the indexer lag metric.

Tenant isolation

Every index document carries tenant_id as a field. Every query issued by the Search module adds a mandatory tenant_id filter at the index level — this is enforced inside the module, not left to callers. A misconfigured query that omits the filter is rejected before reaching the index.

Fallback: Postgres full-text search

When OpenSearch is not deployed (lite tier, development), the Search module routes queries to the Postgres tsvector index. The fallback covers name and basic metadata search; advanced ranking and aggregations require OpenSearch. Switching tiers is a configuration change — the Search.Query API surface is identical.

Index rebuild

:::tip The search index can be fully rebuilt by replaying the change journal. If the index is corrupted, the schema changes (e.g. after an OpenSearch mapping update), or the deployment is migrated to a new cluster, rebuild rather than patch — run the indexer replay job from seq = 0 against the new index, then cut over. :::

Rebuild is idempotent: re-indexing does not create duplicates because documents are upserted by node ID.

Extraction characteristics

Search is an early extraction candidate because:

It has a different datastore (OpenSearch) with its own lifecycle and cluster ops.
The indexer worker has a bursty CPU/IO profile that can starve request-serving goroutines.
It can be disabled entirely (Postgres-FTS fallback) without touching the core file or sync paths.
Its API surface is narrow (query / suggest) and has no write path into the metadata database.