09 — Evolution Roadmap (Monolith → Services)

Supporting plan that operationalizes 01 §4. This is a sequencing document, not a schedule — phases are gated by demonstrated need, not dates. The thesis: ship correctness in a modular monolith, then extract services with evidence. The extraction is the portfolio centerpiece (ADR-0001).


1. Guiding principles

  1. Correctness before distribution. Sync, the commit protocol, and tenant isolation must be right before anything is split. A wrong monolith is easier to fix than a wrong distributed system.
  2. Each phase ships something usable. No phase is pure plumbing.
  3. Every split has a forcing function (05 §5) and an ADR.
  4. Observability precedes scale. You cannot extract safely what you cannot trace.
  5. Demonstrate, then document. Each phase produces an artifact (trace, load test, chaos result) that proves it works — that is the portfolio.

2. The phases

flowchart LR
    classDef p0 fill:#e5e7eb,stroke:#6b7280,color:#111827;
    classDef p1 fill:#fde68a,stroke:#b45309,color:#111827;
    classDef p2 fill:#fed7aa,stroke:#c2410c,color:#111827;
    classDef p3 fill:#bfdbfe,stroke:#1d4ed8,color:#111827;
    classDef p4 fill:#bbf7d0,stroke:#15803d,color:#111827;

    P0["P0 · Walking Skeleton<br/>1 binary, PG+MinIO<br/>upload/download + OTel"]:::p0
    P1["P1 · Core Product<br/>namespace, versions, sharing,<br/>commit protocol, RLS, CLI"]:::p1
    P2["P2 · Sync<br/>change journal, deltas,<br/>conflict copies, web app"]:::p2
    P3["P3 · Async plane<br/>NATS+outbox, search,<br/>notifications, GC worker"]:::p3
    P4["P4 · Extraction & Scale<br/>split workers/sync, Helm full,<br/>HPA, load+chaos proof"]:::p4
    P5["P5 · Breadth<br/>more storage adapters,<br/>previews, mobile, multi-region*"]:::p4

    P0 --> P1 --> P2 --> P3 --> P4 --> P5

P0 — Walking Skeleton (prove the spine)

P1 — Core Product (make it genuinely useful)

P2 — Synchronization (the headline)

Note: P2’s journal is still fed in-process from the File context — NATS is not required yet. The event bus interface (internal/platform/bus) is in place from P0 with an in-proc implementation, so P3 is a swap, not a rewrite.

P3 — The async derivation plane (add the event-driven story)

P4 — Extraction & Scale (the principal-grade demonstration)

P5 — Breadth (only after depth)


3. What is intentionally deferred at each gate

Until you have… Don’t build…
a working commit protocol (P0) NATS, OpenSearch, multiple services
correct sync (P2) previews, mobile, extra adapters
an event backbone + outbox (P3) extracted services
traces + load tests (P4) multi-region, service mesh, operators-for-everything
a stable public API (P4) the mobile app

This table is the antidote to the Overengineering Ledger (01 §3): each row is a forcing function that unlocks the next layer of complexity.


4. Definition of done for the project-as-portfolio

The project is “complete enough to demonstrate principal-level work” when:

Hitting these matters far more than the count of microservices or storage adapters.