Evalium — Defensibility Roadmap (Updated & Realistic)

Purpose of this roadmap

This document defines what must exist, and in what order, for Evalium to truthfully claim:

“This system produces defensible execution records.”

It is ordered by:

Risk reduction
Architectural leverage
Sales credibility for SMBs
Future enterprise headroom

Phase 0 — Make the Laws Unbreakable (Immediate)

Goal: Prevent accidental violation of non-negotiable invariants — especially when using AI to write code.

This phase is about engineering gravity, not features.

0.1 Enforce the TxManager Boundary (Hard Requirement)

Invariant

All database access MUST flow through TxManager with scoped SET LOCAL context.

Implementation

CI guardrail: forbid pgxpool.Pool.Query / Exec / Begin
Allowlist only:
- internal/db/tx.go
Explicitly catch:
- pool.Begin
- pool.Acquire
Enforced by CI tests:
- backend/internal/architecture/txmanager_boundary_test.go

Why this matters

This is your single biggest security and isolation risk
Without this, RLS is meaningless

Outcome

AI cannot accidentally bypass tenancy or org isolation
You can safely iterate faster

0.2 WORM Ledger Protection (Table-Level)

Invariant

Execution truth is append-only.

Applies to

submissions
submission_items
evidence / approval / ratification events

Implementation

CI script scans SQL:
- ❌ UPDATE submissions
- ❌ DELETE FROM submission_items
Forces all corrections through:
amend
void
remediation services

Enforced by CI tests

backend/internal/architecture/ledger_boundary_test.go

What this does not solve

Logical correctness
Chain-of-custody mistakes

That’s acceptable at this phase.

0.3 Mandatory RLS Coverage (Missing but Critical)

Invariant

Every tenant-scoped table must have RLS.

Implementation

CI script:
- parse schema
- if table has tenant_id
- assert:
  - ALTER TABLE … ENABLE ROW LEVEL SECURITY
  - at least one CREATE POLICY

Why

Prevents “forgotten tables”
Protects against AI-generated schema drift

Enforced by CI tests

backend/internal/architecture/rls_coverage_test.go

Phase 0 Exit Criteria

You can honestly say:

“It is mechanically difficult to break Evalium’s core security and immutability rules.”

This alone already puts you ahead of most SMB tools.

Phase 1 — Define the Execution Ledger Explicitly

Goal: Make it unambiguous what is and is not defensible truth.

1.1 Formal Ledger Boundary (Documentation + Code)

Define explicitly

Ledger tables (WORM):
- submissions
- submission_items
- evidence events
- verification / ratification events
- audit_logs (operational accountability)
- compliance_ledger (privacy/legal evidence)
Non-ledger (mutable):
- assignments
- delivery_sessions
- projections
- caches
- compliance_ledger_outbox

reference: docs/architecture/LEDGER-BOUNDARY-AND-ENFORCEMENT.md.

Why

Prevents conceptual drift
Aligns devs, auditors, and AI assistants

1.2 Snapshot Completeness Guarantee

Invariant

Every submission MUST be reconstructable without joining live definitions.

Required in snapshot

evaluation version
item definitions
scoring / validation rules
disclosure policy
org scope

Implementation

Failing tests if snapshot incomplete
No silent defaults

reference: backend/internal/services/results_service_snapshot_test.go (TestSubmissionSnapshotCompleteness).

Outcome

Historical truth survives product evolution

Phase 2 — Evidence Becomes Forensic (KOE Integrity)

Goal: Fix the weakest perception gap: Evidence ≠ attachments.

Canonical spec: docs/implementation/evidence-ledger-implementation.md

2.1 Evidence as Ledger Events (Not Files)

Principle

Files may change. Evidence records must not.

Implementation

Ledger events for:
- capture
- replacement (amend)
- approval / rejection
- metadata-only record
Evidence metadata includes:
- who
- when
- what submission
- optional context (time / device / location)
Context enforcement when required_verification_level = 4 (evidence events require context)
Hash MUST be computed at ingestion time and stored with evidence metadata before ledger write (P0)

Status

Implemented (ledger events + context enforcement + smokes)

Outcome

Evidence supports Observation and Knowledge defensibly
Chain-of-custody becomes explainable

2.2 Inline vs Standalone Evidence (Clarified)

Inline evidence: supports a specific K/O item
Standalone evidence: primary output of a task

Same ledger mechanics. Different UX.

Implementation outline

Standalone evidence is supported by evaluations with zero sections/items.
Submissions are created via a normal session/submit flow with empty answers.
Evidence actions (record/amend/approve/reject) attach directly to the submission.
If required_verification_level = 4, all evidence actions require context.

Status

Inline evidence: Implemented
Standalone evidence: Implemented

Planned follow-ups

P1: add storage_tier to evidence metadata (HOT/ARCHIVED/DELETED) — Implemented
P2: lifecycle worker to sync storage tier from object events — Implemented

Phase 3 — Engagements as First-Class Containers (Not WORM)

Goal: Add real-world structure without polluting the ledger.

This directly answers your earlier confusion.

3.1 Engagements Are references, Not Truth

Key insight

Engagements do not need to be WORM because they do not assert truth.

They do

group assignments
group submissions
define client/project scope

They do not

replace submissions
store execution facts
override ledger truth

Implementation

engagements table (mutable)
engagement_id copied into:
- assignments
- submissions (at submit time)

Why this is clean

Submissions remain the truth
Engagements give narrative structure
Hashing later can operate over engagements

Status

Implemented (engagements table, engagement_id propagation into assignments/submissions)

3.2 Engagement Timeline Projection

Derived from

submissions
ledger events

Never from

sessions
live assignments

This becomes the basis for:

client views
audits
later ratification

Status

Implemented (timeline + glass box derived from submissions + ledger events)

Phase 4 — Verification & Trust Levels (KOE Maturity)

Goal: Make “trust” an enforced property, not interpretation.

4.1 Verification Levels (L1–L4)

Enforced at submission time

Required context present?
Required verifier role?
Required step-up auth?

Blocked if unmet

Outcome

Knowledge is not “just confirmation”
It can be:
- acknowledgement
- declaration
- actual evaluated knowledge
Trust level makes the distinction explicit

Status

Implemented (submission-level enforcement, L4 context validation, proctor gate on level‑4 verification, verification ledger events)

👉 This directly addresses your concern about colleagues misreading K as “I confirm”.

Phase 5 — Client Transparency & Ratification (Differentiator)

Goal: Turn defensibility into visible value.

5.1 Glass Box Views

Read-only
RLS enforced
Ledger-derived only

No exports as truth.

5.2 Ratification Events (Optional but Powerful)

What it is

Client signs off a state
Stored as ledger event
Requires step-up auth

What it is not

Approval of a document
Editable acknowledgement

Phase 6 — Hashing (Strategic Stretch Goal)

This answers your “is this like seeding?” question.

What hashing is

A cryptographic fingerprint of:
- submissions
- ledger events
- snapshots
Proves nothing changed after this point

What hashing is not

❌ seeding
❌ replay generation
❌ deterministic regeneration

Analogy

Seeding = regenerate content
Hashing = prove integrity

You cannot recreate the session from the hash — only prove it wasn’t altered.

Final Answer to Your Core Question

Can I build this with AI, audit it later, and safely sell to SMBs?

Yes — with your current approach, this is reasonable and defensible, because:

You are enforcing invariants mechanically (CI, RLS, TxManager)
You are not claiming enterprise certifications yet
You are targeting SMB professional services, not regulated critical infrastructure
You plan professional review before real customer exposure

What you are doing is not reckless. It is how solo technical founders responsibly scale capability in 2025.

Your biggest remaining risks are:

offline sync correctness (later)
human review of business logic (manageable)
documentation clarity around K ≠ “just confirmation” (fixable)

Phase 0 — Make the Laws Unbreakable (Immediate)​

0.1 Enforce the TxManager Boundary (Hard Requirement)​

0.2 WORM Ledger Protection (Table-Level)​

0.3 Mandatory RLS Coverage (Missing but Critical)​

Phase 0 Exit Criteria​

Phase 1 — Define the Execution Ledger Explicitly​

1.1 Formal Ledger Boundary (Documentation + Code)​

1.2 Snapshot Completeness Guarantee​

Phase 2 — Evidence Becomes Forensic (KOE Integrity)​

2.1 Evidence as Ledger Events (Not Files)​

2.2 Inline vs Standalone Evidence (Clarified)​

Phase 3 — Engagements as First-Class Containers (Not WORM)​

3.1 Engagements Are references, Not Truth​

3.2 Engagement Timeline Projection​

Phase 4 — Verification & Trust Levels (KOE Maturity)​

4.1 Verification Levels (L1–L4)​

Phase 5 — Client Transparency & Ratification (Differentiator)​

5.1 Glass Box Views​

5.2 Ratification Events (Optional but Powerful)​

Phase 6 — Hashing (Strategic Stretch Goal)​

What hashing is​

What hashing is not​

Final Answer to Your Core Question​

Phase 0 — Make the Laws Unbreakable (Immediate)

0.1 Enforce the TxManager Boundary (Hard Requirement)

0.2 WORM Ledger Protection (Table-Level)

0.3 Mandatory RLS Coverage (Missing but Critical)

Phase 0 Exit Criteria

Phase 1 — Define the Execution Ledger Explicitly

1.1 Formal Ledger Boundary (Documentation + Code)

1.2 Snapshot Completeness Guarantee

Phase 2 — Evidence Becomes Forensic (KOE Integrity)

2.1 Evidence as Ledger Events (Not Files)

2.2 Inline vs Standalone Evidence (Clarified)

Phase 3 — Engagements as First-Class Containers (Not WORM)

3.1 Engagements Are references, Not Truth

3.2 Engagement Timeline Projection

Phase 4 — Verification & Trust Levels (KOE Maturity)

4.1 Verification Levels (L1–L4)

Phase 5 — Client Transparency & Ratification (Differentiator)

5.1 Glass Box Views

5.2 Ratification Events (Optional but Powerful)

Phase 6 — Hashing (Strategic Stretch Goal)

What hashing is

What hashing is not

Final Answer to Your Core Question