Search & Indexing
Purpose
Search is a derived, disposable anti-corruption layer over the search index.
It owns no source-of-truth data — the index is entirely reconstructible by replaying the NodeChanged event stream.
Its role is to translate query requests into index queries while enforcing tenant isolation, and to consume events and keep the index fresh.
Data owned
Search owns derived indexes only — not rows in the metadata database:
| Index | Backend | Rebuilt from |
|---|---|---|
| Full-text + metadata index | OpenSearch | NodeChanged event stream |
| Postgres tsvector index | Postgres tsvector columns |
Same NodeChanged stream |
The Postgres-FTS index is the fallback for lite deployments where OpenSearch is not available.
Internal API
Search.* gRPC methods:
| Method | Description |
|---|---|
Search.Query |
Full-text + filter query; always scoped to tenant_id |
Search.Suggest |
Autocomplete suggestions for file/folder names |
Event consumption
The search indexer is an async worker that subscribes to node.* events from NATS JetStream:
flowchart LR
FM["File & Metadata<br/>(outbox)"]
J{{"NATS JetStream<br/>node.*"}}
IX["Search Indexer<br/>(worker)"]
OS[("OpenSearch")]
PG[("Postgres tsvector")]
FM -->|NodeChanged| J
J -->|at-least-once| IX
IX -->|upsert doc| OS
IX -->|update tsvector| PG
- Consumers are idempotent: re-indexing an already-indexed event is safe (upsert by node ID + version).
- NATS JetStream ensures at-least-once delivery with a dead-letter queue for poison messages.
- Index lag is observable: the difference between the latest
CHANGE.seqand the last indexed seq is the indexer lag metric.
Tenant isolation
Every index document carries tenant_id as a field.
Every query issued by the Search module adds a mandatory tenant_id filter at the index level — this is enforced inside the module, not left to callers.
A misconfigured query that omits the filter is rejected before reaching the index.
Fallback: Postgres full-text search
When OpenSearch is not deployed (lite tier, development), the Search module routes queries to the Postgres tsvector index.
The fallback covers name and basic metadata search; advanced ranking and aggregations require OpenSearch.
Switching tiers is a configuration change — the Search.Query API surface is identical.
Index rebuild
:::tip
The search index can be fully rebuilt by replaying the change journal. If the index is corrupted, the schema changes (e.g. after an OpenSearch mapping update), or the deployment is migrated to a new cluster, rebuild rather than patch — run the indexer replay job from seq = 0 against the new index, then cut over.
:::
Rebuild is idempotent: re-indexing does not create duplicates because documents are upserted by node ID.
Extraction characteristics
Search is an early extraction candidate because:
- It has a different datastore (OpenSearch) with its own lifecycle and cluster ops.
- The indexer worker has a bursty CPU/IO profile that can starve request-serving goroutines.
- It can be disabled entirely (Postgres-FTS fallback) without touching the core file or sync paths.
- Its API surface is narrow (query / suggest) and has no write path into the metadata database.