File & Metadata
Purpose
File & Metadata is the spine of BitVault — the source of truth for the entire file namespace. It owns the commit protocol that defeats dual-write: a namespace mutation and its event are one atomic transaction. No version can exist without verified bytes; no event can be missed without a visible gap.
Data owned
| Table | Purpose |
|---|---|
nodes |
Files and folders; materialized path for subtree ops |
versions |
Immutable version records; each references a content-addressed BLOB |
node_metadata |
Arbitrary key/value tags per node |
tags |
User-defined taxonomy tags |
trash |
Soft-deleted nodes (deleted_at set; row retained) |
outbox |
NodeChanged events written in the same transaction as the VERSION insert |
Internal API
Files.* gRPC methods:
| Method | Description |
|---|---|
Files.CreateUpload |
Reserve a node draft and return a presigned PUT URL |
Files.CommitUpload |
Verify bytes, atomically write node + version + outbox row |
Files.Move |
Rename or reparent a node (byte-free) |
Files.Copy |
Create a new node referencing the same BLOB (refcount++) |
Files.List |
List folder contents (paginated) |
Files.GetVersions |
List version history for a node |
Files.TrashNode |
Soft-delete (sets deleted_at) |
Files.RestoreNode |
Clear deleted_at, re-activate node |
Files.PurgeTrash |
Hard-delete from trash; triggers refcount decrement in Storage |
Commit protocol
The commit protocol is the central correctness invariant of BitVault. It ensures bytes are verified before any metadata is written, and that the event is never lost.
sequenceDiagram
autonumber
participant C as Client
participant GW as API Gateway
participant F as File & Metadata
participant S as Storage
participant O as Object Store
participant B as Event Bus
participant IX as Indexer
C->>GW: POST /v1/files (init: path, size, hash)
GW->>F: CreateUpload(node draft) [gRPC]
F->>S: PresignPut(staging key, size-range, ttl)
S-->>F: presigned URL (+ uploadId if multipart)
F-->>GW: uploadId + presigned URL(s)
GW-->>C: 201 {uploadId, url}
C->>O: PUT bytes (direct, presigned)
O-->>C: 200 (ETag)
C->>GW: POST /v1/files/{uploadId}/commit (etag, hash)
GW->>F: CommitUpload(uploadId, hash)
F->>S: HeadObject(staging key) — verify size/etag/hash
S->>O: HEAD staging key
O-->>S: metadata (size, etag)
S-->>F: verified ✔ (blob refcount++)
Note over F: BEGIN TX<br/>insert/replace node + version<br/>insert outbox(NodeChanged)<br/>COMMIT
F-->>GW: 200 (node, version)
GW-->>C: 200 committed
F->>B: publish NodeChanged (drained from outbox)
B->>IX: NodeChanged → index (eventual)
Failure paths:
- Client uploads bytes but never commits → staging blob has
refcount = 0→ GC reclaims it after TTL. - Commit fails verification (hash/size mismatch) → no metadata written → client retries safely.
- Transaction fails after verification → staging blob is rolled back from
committedtostaging→ GC reclaims it. - Outbox decouples publish from the transaction → at-least-once delivery; consumers are idempotent.
:::danger
Invariant I1: A VERSION row must only be created after the BLOB’s bytes have been verified (size / etag / hash match).
Never commit without verification. A VERSION that references unverified bytes breaks content integrity, dedup accounting, and GC safety.
:::
Node model
- Types:
fileorfolder. Folders contain nodes; files hold versions. - Materialized path: each node stores the full path string to enable efficient subtree queries (
WHERE path LIKE '/tenant/a/b/%') without recursive CTEs. - Namespace ops are byte-free: move, rename, and copy mutate
NODErows (andpath), neverBLOBobjects. Copy creates a newNODE/VERSIONreferencing the sameBLOBwithrefcount++.
Versioning
Each CommitUpload creates a new VERSION row referencing the content-addressed BLOB.
NODE.current_version_id points to the head version. The full version history is accessible via Files.GetVersions.
Deleting a version decrements the BLOB’s refcount; GC reclaims the object when refcount reaches zero.
Trash and restoration
Trash is a soft-delete: NODE.deleted_at is set. The row is retained so restore is a metadata-only operation.
Hard purge removes the row and triggers BLOB refcount decrements for all associated versions.
Events emitted
| Event | Subject | Trigger |
|---|---|---|
NodeChanged |
node.changed |
Any create / update / move / delete committed |