03 — System Requirements
Covers task 5. Functional requirements (what it does) and non-functional requirements (how well it must do it). NFRs are written as measurable targets because an unmeasurable requirement is a wish, not a requirement.
Targets below are v1 SaaS numbers at small scale; self-host inherits the same functional set with relaxed availability. They are starting SLOs to validate with load tests, not contractual guarantees.
1. Functional Requirements
FR-A — Identity & Access
- A1. Tenant (organization) lifecycle: create, suspend, delete (with data export/erasure).
- A2. User lifecycle within a tenant; invitations; deactivation.
- A3. Authentication: email/password (with secure hashing), and OIDC/SSO for SaaS & enterprise.
- A4. Sessions (web) and long-lived API tokens / PATs (CLI, automation), revocable.
- A5. RBAC: roles (owner, admin, member, viewer) scoped to tenant and to resources.
- A6. Audit log of security-relevant actions.
FR-B — Files & Namespace
- B1. Hierarchical folders; a file/folder is a node in a per-tenant tree.
- B2. Upload (small single-shot; large multipart + resumable), download, delete (soft → hard via GC).
- B3. Move, copy, rename (namespace operations, cheap — no byte movement).
- B4. Versioning: every content change creates a recoverable version; restore prior versions.
- B5. Content-addressed storage: identical bytes stored once per tenant (hash-based, ref-counted).
- B6. Rich metadata: system (size, mime, hashes, timestamps) + user (tags, custom key/values).
- B7. Trash/recycle with retention window, then GC.
FR-C — Synchronization
- C1. Device registration; per-device sync cursor.
- C2. Change feed: monotonic, per-tenant (or per-user-namespace) ordered journal of mutations.
- C3. Delta sync: client pulls changes since its cursor; transfers only changed content.
- C4. Offline edits queue and reconcile on reconnect.
- C5. Conflict policy: concurrent divergent edits → conflicted copy (e.g.
report (conflict, Alice, 2026-06-11).docx); never silent overwrite. (ADR-0008) - C6. Selective sync (choose folders) — post-v1, but the change-feed design must not preclude it.
FR-D — Sharing & Permissions
- D1. Internal shares to users/groups with a role (view/edit/manage).
- D2. External public links with optional password, expiry, and max-download count.
- D3. Permission inheritance down the tree, with explicit overrides.
- D4. Revocation; visibility of “who has access” per node.
FR-E — Search & Discovery
- E1. Search by name and metadata/tags (v1, Postgres-backed).
- E2. Filters: type, owner, date range, size, tag.
- E3. Full-text content search inside documents (later, OpenSearch-backed, opt-in). (ADR-0009)
- E4. Previews/thumbnails generation for common types (async, event-driven; later phase).
FR-F — Events, Notifications & Audit
- F1. Domain events emitted for all significant mutations (created/updated/moved/shared/deleted).
- F2. Webhooks (per-tenant, signed) for integration.
- F3. In-app + email notifications (share received, comment, quota warning).
- F4. Immutable audit trail (security/compliance).
FR-G — Administration, Quotas & Metering
- G1. Per-tenant storage quota and enforcement (reject/deny over quota).
- G2. Usage metering (bytes stored, transferred, request counts) for plans/billing inputs.
- G3. Tenant admin console data (members, usage, shares, audit).
- G4. Feature flags / config per tenant and per deployment profile.
FR-H — Clients & API
- H1. Public REST API (versioned) covering all tenant-facing operations.
- H2. Go CLI for auth, upload/download, sync, share, admin — dogfoods the REST API.
- H3. Next.js web app (browse, upload, share, search, admin).
- H4. React Native mobile app — future client; API must be stable & versioned first.
2. Non-Functional Requirements
NFR-1 — Availability
- Control plane (metadata/API) target 99.9% (SaaS). Data plane availability is effectively that of the object store (direct-to-storage transfers).
- Self-host: best-effort; single-node Compose is acceptable.
NFR-2 — Durability
- File bytes: delegate to object store durability (S3-class ≈ 11 nines). BitVault must never be the weak link: no committed metadata without verified bytes (R2).
- Metadata: Postgres with backups/PITR; documented RPO/RTO for self-host & SaaS.
NFR-3 — Latency (control plane, p99 @ small scale)
| Operation | Target p99 | |—|—| | Metadata read (list folder, stat) | < 100 ms | | Metadata write (commit, move, share) | < 200 ms | | Presigned URL issuance | < 50 ms | | Search (name/metadata) | < 300 ms | | Change-feed pull (delta) | < 200 ms |
Bulk byte transfer latency is governed by the object store + client bandwidth, by design.
NFR-4 — Scalability
- All request-handling services stateless → horizontal scale behind K8s HPA.
- State isolated to Postgres, Redis, NATS, OpenSearch, object store.
- Data plane scales independently of the control plane (presigned URLs).
- Target: linear throughput scaling for metadata ops to the Postgres connection/IOPS ceiling; document that ceiling and the read-replica path past it.
NFR-5 — Consistency model (be explicit; ambiguity here causes bugs)
- Namespace & metadata: strongly consistent (single Postgres source of truth, transactional).
- Search index, notifications, previews, usage counters: eventually consistent (derived, event-driven). The UI must surface “processing…” states.
- Sync: causal/cursor-based; conflicts surfaced, never hidden.
NFR-6 — Security
- TLS everywhere (external) + mTLS between services once split.
- Encryption at rest: envelope encryption with KMS/Vault; per-tenant data keys. (ADR-0014)
- Tenant isolation enforced at DB layer (RLS), not app-only. (ADR-0007)
- Least-privilege, short-TTL, tightly-scoped storage credentials (presigned). (ADR-0011)
- OWASP ASVS-aligned input handling; rate limiting; audit logging; secret hygiene (no secrets in images/config).
- GDPR-ready: per-tenant/user data export and hard erasure.
NFR-7 — Observability
- OpenTelemetry traces, metrics, logs with a single correlation/trace ID propagated REST → gRPC → NATS. (ADR-0013)
- RED metrics (Rate/Errors/Duration) per endpoint; USE metrics for infra; queue-depth/consumer-lag for NATS; index-lag for OpenSearch.
- Health (
/healthz) and readiness (/readyz) on every service; structured JSON logs. - Documented SLOs + error budgets for the targets above.
NFR-8 — Portability & Operability
- Docker-first: every component a minimal, non-root, multi-stage image.
- Kubernetes-first: Helm chart with profiles (lite/standard/full); plain manifests + Compose for self-host.
- 12-factor config (env/secret-driven); no host coupling; graceful shutdown & connection draining.
- Reproducible builds; SBOM; image signing (later).
NFR-9 — Maintainability & Evolvability
- Modules along bounded-context lines with explicit interfaces (extractable to services).
- Protobuf as single contract source; generated REST/clients to prevent drift. (ADR-0003)
- Migrations versioned and forward-only; schema changes reviewed.
- Conformance test suite for storage adapters; contract tests at module boundaries.
NFR-10 — Cost efficiency
- No proxying of bulk bytes through compute (R5/R12).
- Object lifecycle tiering; ref-counted dedup; quota enforcement.
- Tiered dependencies so self-host runs cheap (Postgres + object store only).
3. Constraints & Assumptions
- Constraint: single primary region for v1 (NFR architecture must not preclude multi-region — NG9).
- Constraint: Go (backend/CLI), TypeScript/Next.js (web), React Native (future mobile).
- Assumption: object store provides presigned URLs and multipart (true for all five targets).
- Assumption: self-hosters can run Postgres + an S3-compatible store (MinIO) at minimum.
- Assumption: initial load is small; all NFR numbers are validated-by-load-test SLOs, not guarantees.
4. Acceptance Themes (how we’ll know requirements are met)
- Sync: an automated test harness simulates two offline devices editing the same file → asserts a conflicted copy is produced and no version is lost (C5).
- Dual-write: fault-injection (kill after byte upload, before commit) → asserts no dangling reference and the orphan is GC’d (R2).
- Isolation: a test asserts tenant A cannot read tenant B’s nodes even with a
forged
tenant_idin the app layer (RLS blocks it) (R4). - Data plane: load test confirms byte throughput is independent of control-plane replicas (R5).
- Observability: a single upload produces one connected trace spanning REST → gRPC → NATS → indexer.