01 — Flagship Features (Deep Dive)
The six features that turn BitVault into a programmable, verifiable, governable platform. Each: a technical sketch + Why it matters · Complexity · Dependencies · Resume impact. These compound (README §1).
Complexity scale: S (weeks) · M (1–2 mo) · L (quarter) · XL (multi-quarter).
1. WASM Plugin Runtime
Let users (and us) extend BitVault with sandboxed code that runs inside the platform — content processors, custom storage providers, custom auth, event handlers, policy hooks — written in any language, compiled to WebAssembly.
flowchart TB
classDef h fill:#bbf7d0,stroke:#15803d,color:#111827;
classDef p fill:#fde68a,stroke:#b45309,color:#111827;
classDef c fill:#c7d2fe,stroke:#3730a3,color:#111827;
host["BitVault host (Go)<br/>wazero runtime"]:::h
subgraph SB["WASM sandbox (per invocation)"]
plug["plugin module (.wasm)<br/>any language"]:::p
end
host -->|"instantiate + fuel/mem/time limits"| plug
plug -->|"imports (capabilities only)"| cap["host functions:<br/>read_input · write_output ·<br/>scoped_http · scoped_kv · log · emit_event"]:::c
cap --> host
host -. "deny by default: no FS, no net, no syscalls" .-> plug
- Runtime: wazero — zero-dependency, CGO-free Go WASM runtime, so it embeds
cleanly in
bitvaultdand keeps cross-compilation. Interpreter + optimizing compiler modes. - Capability security (whitelist, not blacklist): a module can do nothing by default — no filesystem, network, or syscalls. The host grants only explicit host functions (read the input chunk, write output, a scoped HTTP client, a scoped KV, emit an event). This is the security model Envoy and Helm 4 use.
- Isolation & safety: per-invocation instance; resource limits (fuel/CPU, memory, wall-clock); a plugin panic is contained by the runtime and cannot crash the host. Multi-tenant-safe by construction.
- Extension points (the catalog, 09): content transforms, storage-provider adapters, auth providers, event handlers, policy hooks.
- Distribution: signed plugin modules (cosign, reusing ADR-0032), an optional registry.
| Why it matters | Turns BitVault from a closed app into a platform; every other programmable feature (Functions, transforms, DLP, custom providers) rides on it. Ecosystem leverage. |
| Complexity | L — runtime embedding is M; the hard parts are the capability/host-function ABI, resource governance, and a clean PDK. |
| Dependencies | Go host (ADR-0001); event system (08) for handler triggers; signing (ADR-0032). |
| Resume impact | Very high. “Designed a capability-based WASM plugin system with sandboxed execution and resource isolation” signals language-runtime + security depth few engineers have. |
2. BitVault Functions (event-driven compute)
S3-events-plus-Lambda, but yours: run a WASM function on storage events (“on upload to
/invoices, OCR it and extract totals”). The marriage of the plugin runtime (§1) and
the event system (08).
sequenceDiagram
autonumber
participant FM as File & Metadata
participant BUS as Event bus (NATS, ADR-0006)
participant FN as Functions runtime (WASM pool)
participant ST as Storage
FM->>BUS: NodeChanged (upload to /invoices)
BUS->>FN: matching trigger fires
FN->>FN: warm WASM instance + capability grant
FN->>ST: read input chunk(s) (scoped)
FN->>FN: run user function (OCR, extract)
FN->>FM: write derived metadata / new file (scoped)
FN->>BUS: emit result event (chainable)
- Triggers: declarative bindings (event type + path/tag filter) → a function.
- Execution: warm WASM instance pools; idempotent (event id dedup, ADR-0006); retries + DLQ; per-tenant concurrency limits; KEDA-scaled on queue depth (platform/04).
- Capabilities: scoped to the triggering tenant/path; functions compose (output events trigger more functions) — a safe, observable pipeline.
| Why it matters | The automation killer-app and the reason developers stay: extend the product without us shipping every integration. Powers transforms, DLP, custom workflows. |
| Complexity | L–XL — builds on §1; adds the trigger router, warm-pool scheduler, idempotency/retry, and multi-tenant fairness. |
| Dependencies | §1 (runtime), event system (08), storage scoped access (ADR-0011), KEDA. |
| Resume impact | Very high. “Built a multi-tenant, event-driven serverless runtime on WASM with idempotent execution and autoscaling” is a systems headline. |
3. Policy-as-code: Cedar + ReBAC
Replace ad-hoc ACLs with authorization as code: a verified policy engine for permissions and a relationship graph for sharing — plus the ability to prove properties about your policies.
flowchart TB
classDef e fill:#fde68a,stroke:#b45309,color:#111827;
classDef d fill:#bbf7d0,stroke:#15803d,color:#111827;
req["request: principal · action · resource · context"]:::e --> cedar["Cedar engine (deny-by-default, formally verified)"]:::e
rebac["ReBAC graph (Zanzibar/OpenFGA): owner/editor/viewer, group, folder inheritance"]:::e --> cedar
attrs["attributes: tenant, tags, residency, classification"]:::e --> cedar
cedar --> dec{"permit / forbid"}:::d
cedar -.-> sim["policy simulation:<br/>what-if + 'prove no public access to /secret'"]:::d
- Cedar for permissions: PARC model (principal/action/resource/context), deny by default, formally verified authorization engine, 42–60× faster than Rego — fits BitVault’s principal/node/action model directly (ADR-0007, ADR-0010).
- ReBAC (Zanzibar/OpenFGA) for the sharing graph: “user → editor → folder → inherited by children,” group membership, the Google-Drive-style relationship model that RBAC alone can’t express.
- Governance policies as code: retention, residency (ADR-0020), DLP, external-sharing rules — same engine, evaluated in the data plane.
- Policy simulation: because Cedar is verifiable, offer “what-if” and
“prove no policy permits public read of
/legal“ — automated reasoning over policies, not testing-by-luck.
| Why it matters | Most file products have brittle, ad-hoc permissions. Verifiable policy-as-code + a real sharing graph is enterprise-grade governance and a genuine differentiator. |
| Complexity | L — integrating Cedar is M; the ReBAC graph + consistency (Zanzibar “zookies”) and simulation push it to L. |
| Dependencies | Identity/sharing contexts (04 bounded-contexts), ADR-0007/ADR-0010. |
| Resume impact | Very high. “Authorization via a formally-verified policy engine + a Zanzibar-style ReBAC graph, with policy simulation” is rare, senior security/distributed-systems signal. |
4. End-to-End Encrypted Private Vaults
An opt-in, zero-knowledge vault tier where the server stores only ciphertext and cannot read the data — keys live client-side.
flowchart TB
classDef c fill:#c7d2fe,stroke:#3730a3,color:#111827;
classDef s fill:#fecaca,stroke:#b91c1c,color:#111827;
file["file"]:::c --> ck["random per-file content key (CK)"]:::c
ck --> enc["encrypt chunks client-side"]:::c --> server[("server stores ciphertext only")]:::s
ck --> wrapU["wrap CK with user's public key"]:::c
ck --> wrapR["wrap CK with each recipient's public key (sharing)"]:::c
wrapU & wrapR --> kstore[("server stores wrapped keys (opaque)")]:::s
recovery["recovery: key escrow / social recovery / passphrase-derived"]:::c
- Per-file content key (random), used to encrypt chunks client-side; the server sees only ciphertext.
- Sharing = re-wrap the content key with each recipient’s public key (envelope encryption end-to-end); revocation rotates the key for future versions.
- The honest tradeoff (revisits NG3): E2E breaks server-side search, previews, dedup, and Functions on that data — so it’s an opt-in tier, not the default (ADR-0014). Mitigations: client-side encrypted search index; client-side thumbnails.
- Key recovery (the real hard part): passphrase-derived keys + optional escrow / social recovery — “lose your key, lose your data” must be a deliberate user choice.
| Why it matters | A credible zero-knowledge tier is a top-tier trust differentiator (Proton/Tresorit/Cryptomator territory) and the marquee security feature. |
| Complexity | XL — applied cryptography, key management, sharing/revocation, recovery UX, and the feature-loss tradeoffs. Easy to get subtly wrong. |
| Dependencies | KMS/envelope model (ADR-0014), storage chunking (storage/02), client SDK. |
| Resume impact | Very high (if done correctly). Applied crypto + key management + secure sharing is a standout — and demonstrates the judgment to scope tradeoffs honestly. |
5. S3-Compatible API
Expose BitVault as a drop-in S3 endpoint. Every tool that speaks S3 — aws-cli,
boto3, rclone, Terraform, backup tools, data frameworks — works against BitVault
unchanged.
flowchart LR
classDef e fill:#dbeafe,stroke:#1e40af,color:#111827;
classDef g fill:#fde68a,stroke:#b45309,color:#111827;
classDef s fill:#bbf7d0,stroke:#15803d,color:#111827;
tools["aws-cli · boto3 · rclone · Terraform · Spark"]:::e -->|"S3 REST + SigV4"| gw["S3 gateway:<br/>auth (SigV4) · bucket/object map · multipart"]:::g
gw --> ns["map: bucket→space, key→node path"]:::g
gw --> st["reuse storage subsystem<br/>(chunks, manifests, presign)"]:::s
gw -.->|"governed by"| pol["Cedar policy + ReBAC (§3)"]:::g
- Protocol surface: SigV4 auth, bucket/object CRUD, list (with the prefix pagination storage/01 already handles), multipart upload (maps to storage/05), presigned URLs.
- Mapping: S3 bucket → BitVault space/folder; key → node path; object versions → BitVault versions. Most of it reuses the storage layer — low new risk.
- Governed: every S3 call flows through the policy engine (§3) and audit (08) — so it’s S3 you can actually govern. Prior art (MinIO, Garage, Ceph RGW) proves feasibility.
| Why it matters | Instant ecosystem: thousands of existing tools/integrations work day one. The single highest adoption lever for developers, at modest build cost. |
| Complexity | M — SigV4 + the common object/multipart subset is well-trodden; full S3 fidelity is a long tail (scope the 80%). |
| Dependencies | Storage subsystem (storage/), policy (07), API gateway (ADR-0003). |
| Resume impact | High. “Implemented an S3-compatible API (SigV4, multipart) over a custom storage engine” is concrete, recognizable, and protocol-level. |
6. Verifiable Storage (CIDs + Merkle proofs)
Lean into the content-addressed core (storage/02): give every object a verifiable content identifier and let anyone prove a file is intact and unaltered — without trusting BitVault.
flowchart TB
classDef c fill:#bbf7d0,stroke:#15803d,color:#111827;
classDef p fill:#fde68a,stroke:#b45309,color:#111827;
obj["object → CID (BLAKE3 Merkle root)"]:::c --> link["verifiable link: bitvault://<cid>"]:::p
obj --> proof["Merkle inclusion proof per chunk"]:::p
proof --> verify["client verifies bytes ↔ CID without trusting server"]:::c
obj --> receipt["signed storage receipt + timestamp (transparency log)"]:::p
receipt --> audit["tamper-evident: prove 'this file existed, unchanged, at time T'"]:::c
- CIDs: BLAKE3 is already a Merkle tree (ADR-0016), so a content identifier + per-range inclusion proofs come almost for free — verify any byte range against the root.
- Verifiable links (
bitvault://<cid>): share content whose integrity the recipient can check independently (IPFS-style, but governed). - Signed receipts + transparency log: prove “this exact content existed at time T and hasn’t changed” — chain into a tamper-evident audit log (02 enterprise).
| Why it matters | “Don’t trust, verify” storage is a genuinely novel angle for a file platform — provable integrity, compliance evidence, and a unique sharing primitive. |
| Complexity | M–L — the Merkle machinery exists (BLAKE3); the work is the proof API, verifiable-link format, and the transparency log. |
| Dependencies | Content addressing + BLAKE3 (storage/02, ADR-0016), integrity (storage/04). |
| Resume impact | Very high. Merkle proofs, verifiable data structures, and transparency logs are deep, distinctive, and rarely seen in app engineers. |
How the six compound
- §1 Runtime is the substrate for §2 Functions (and transforms/DLP).
- §2 Functions consume the event system (08) and produce derived data governed by §3.
- §3 Policy governs §5 S3 API, sharing, Functions, everything.
- §4 E2E + §6 Verifiable are the two halves of the trust story (confidentiality + integrity).
- §5 S3 API brings the developers who then write §1 plugins and §2 functions.
That loop — programmable → governable → verifiable → adopted → more programmable — is the moat a plain file-sharing app can’t replicate.