File & Metadata

Purpose

File & Metadata is the spine of BitVault — the source of truth for the entire file namespace. It owns the commit protocol that defeats dual-write: a namespace mutation and its event are one atomic transaction. No version can exist without verified bytes; no event can be missed without a visible gap.

Data owned

Table Purpose
nodes Files and folders; materialized path for subtree ops
versions Immutable version records; each references a content-addressed BLOB
node_metadata Arbitrary key/value tags per node
tags User-defined taxonomy tags
trash Soft-deleted nodes (deleted_at set; row retained)
outbox NodeChanged events written in the same transaction as the VERSION insert

Internal API

Files.* gRPC methods:

Method Description
Files.CreateUpload Reserve a node draft and return a presigned PUT URL
Files.CommitUpload Verify bytes, atomically write node + version + outbox row
Files.Move Rename or reparent a node (byte-free)
Files.Copy Create a new node referencing the same BLOB (refcount++)
Files.List List folder contents (paginated)
Files.GetVersions List version history for a node
Files.TrashNode Soft-delete (sets deleted_at)
Files.RestoreNode Clear deleted_at, re-activate node
Files.PurgeTrash Hard-delete from trash; triggers refcount decrement in Storage

Commit protocol

The commit protocol is the central correctness invariant of BitVault. It ensures bytes are verified before any metadata is written, and that the event is never lost.

sequenceDiagram
    autonumber
    participant C as Client
    participant GW as API Gateway
    participant F as File & Metadata
    participant S as Storage
    participant O as Object Store
    participant B as Event Bus
    participant IX as Indexer

    C->>GW: POST /v1/files (init: path, size, hash)
    GW->>F: CreateUpload(node draft) [gRPC]
    F->>S: PresignPut(staging key, size-range, ttl)
    S-->>F: presigned URL (+ uploadId if multipart)
    F-->>GW: uploadId + presigned URL(s)
    GW-->>C: 201 {uploadId, url}

    C->>O: PUT bytes (direct, presigned)
    O-->>C: 200 (ETag)

    C->>GW: POST /v1/files/{uploadId}/commit (etag, hash)
    GW->>F: CommitUpload(uploadId, hash)
    F->>S: HeadObject(staging key) — verify size/etag/hash
    S->>O: HEAD staging key
    O-->>S: metadata (size, etag)
    S-->>F: verified ✔ (blob refcount++)
    Note over F: BEGIN TX<br/>insert/replace node + version<br/>insert outbox(NodeChanged)<br/>COMMIT
    F-->>GW: 200 (node, version)
    GW-->>C: 200 committed

    F->>B: publish NodeChanged (drained from outbox)
    B->>IX: NodeChanged → index (eventual)

Failure paths:

:::danger Invariant I1: A VERSION row must only be created after the BLOB’s bytes have been verified (size / etag / hash match). Never commit without verification. A VERSION that references unverified bytes breaks content integrity, dedup accounting, and GC safety. :::

Node model

Versioning

Each CommitUpload creates a new VERSION row referencing the content-addressed BLOB. NODE.current_version_id points to the head version. The full version history is accessible via Files.GetVersions. Deleting a version decrements the BLOB’s refcount; GC reclaims the object when refcount reaches zero.

Trash and restoration

Trash is a soft-delete: NODE.deleted_at is set. The row is retained so restore is a metadata-only operation. Hard purge removes the row and triggers BLOB refcount decrements for all associated versions.

Events emitted

Event Subject Trigger
NodeChanged node.changed Any create / update / move / delete committed