🛡️ Delivery Resilience & Hardening Roadmap

Owner: Product Engineering
Status: Draft
Pillar: 4 — Delivery Engine
Related: architecture, FOUNDATION.md, assignments-roadmap.md, roles-and-access-control.md

This document defines the Operational Hardening required for the Delivery Engine (ResultsService). The core logic (preview, snapshotting, scoring) is functionally correct and the schema is mature.
What is missing is the Resilience Layer required for high-scale, adversarial, and lossy network conditions.

We are moving from “trusting the client” to “enforcing on the server”.

0. Scope & Assumptions

This roadmap assumes:

delivery_sessions, submission_items, and submissions already exist and are wired to:
- CreateDeliverySession
- RecordAnswer
- SubmitSession
EvaluationService and ResultsService already:
- Resolve buckets deterministically via seed.
- Generate version_snapshot when creating submissions.
- Compute scores and outcomes from the snapshot.

We also follow the Evalium golden rule:

Configuration (The Law)
- Lives in evaluation_versions and, for overrides, assignments / assignment_overrides.
- It is immutable and snapshotted.
State (The Reality)
- Lives in delivery_sessions (and child tables).
- It is mutable and describes what is actually happening in a specific attempt.
Enforcement (The Judge)
- ResultsService compares Reality to The Law on every request.

This work is not about rewriting the engine.
It is about hardening and operationalising it so that:

timing can never be spoofed by the browser,
background workers can enforce deadlines efficiently,
monitoring can distinguish “truly active” from “zombie” sessions,
audits can explain why a session ended, not just that it ended, and
the schema is ready for high-stakes delivery features (section timing, linear navigation, lockdown browser) without painful rewrites later.

1. Objectives

Zero-Trust Timing
The server never relies on the browser to decide whether a test has expired.
Scalable Expiry
Deadlines are computed once and stored in expires_at, enabling cheap, indexed checks.
Zombie Detection
We distinguish between sessions that are merely “marked active” and sessions with recent activity.
Forensic Auditability
We capture why a session ended (timeout vs user vs admin vs system) and when the client says an answer was produced (for offline/bulk-upload detection).
High-Stakes Readiness
The schema can express:
- per-section timing,
- linear navigation constraints, and
- secure browser configuration / forensics,
even if UI and full enforcement logic arrive later.

Phase 1 — Schema Hardening

Goal: Equip the database to answer operational questions without complex calculations or repeated joins.

1.1 `delivery_sessions` Table Upgrades

We add three mandatory and one optional column to move key state from “runtime calculation” into “persistent, indexed fields”.

ALTER TABLE delivery_sessions
  -- 1. Performance: the absolute wall-clock deadline for this session.
  -- Computed ONCE at creation/resume. Nullable if no time limit applies.
  ADD COLUMN expires_at TIMESTAMPTZ,

  -- 2. Monitoring: liveness / heartbeat.
  -- Updated on every interaction that touches the session (answers, heartbeat, resume).
  ADD COLUMN last_active_at TIMESTAMPTZ DEFAULT NOW(),

  -- 3. Audit: why did this session end?
  -- Preferably use a dedicated enum type in the DB, e.g.
  --   'user_submit', 'auto_expired', 'admin_forced', 'system_error'
  ADD COLUMN termination_reason TEXT,

  -- 4. (Optional) Idempotency / optimistic locking.
  -- Only use once the service layer is prepared to increment and check it.
  ADD COLUMN lock_version INT DEFAULT 1;

-- CRITICAL: Index for the Auto-Close Worker and cheap expiry checks.
CREATE INDEX idx_sessions_active_expires
ON delivery_sessions (status, expires_at)
WHERE status = 'active';

Note: If the codebase is ready, consider introducing a proper enum type for termination_reason (e.g. CREATE TYPE delivery_session_termination_reason AS ENUM (...)) instead of raw text.

1.2 `submission_items` Forensics

ALTER TABLE submission_items
  -- Forensic evidence: when did the CLIENT say this interaction happened?
  -- This is for analysis, not for enforcement. Server-side timestamps remain authoritative.
  ADD COLUMN client_timestamp TIMESTAMPTZ;

Assumption: submission_items already has a server-side created_at / updated_at. If not, that should be added and populated as well.

Phase 2 — The Single Source of Truth (Service Layer)

Goal: Centralise deadline logic so it is never re-implemented or drifted in multiple handlers.

2.1 `CalculateSessionExpiry` (Domain Function)

We must not scatter started_at + time_limit arithmetic across handlers and workers.

We define a single internal domain function, used by:

CreateDeliverySession
Any future ResumeSession
Any mid-flight “extend time” operations (e.g. late accommodations)

Behaviour:

Fetch Baseline
- Read evaluation_version.time_limit (or equivalent settings) for the chosen version.
Apply Assignment Rules
- Apply any assignment.time_limit_override from the assignment that spawned this session.
Apply Overrides / Accommodations
- Apply any assignment_overrides.time_limit_extension (or similar “mercy time” field).
Compute Final Duration
- effective_duration = baseline_limit + overrides + extensions
- If no time limit exists, return nil.
Compute Expiry
- expires_at = started_at + effective_duration
Persist
- Store the result in delivery_sessions.expires_at.

Invariant: From this point on, all expiry checks in the system must consult delivery_sessions.expires_at as the source of truth, never recompute from raw settings.

If a manual override grants extra time mid-flight:

call CalculateSessionExpiry again,
update expires_at in a single, transactional operation.

Phase 3 — Enforcement (Guardrails in Handlers)

Goal: Make the runtime services actively enforce the rules encoded in the schema.

3.1 Hardening `RecordAnswer` (`PUT /sessions/\{id\}/answers`)

Before persisting any answer:

Session Load & Status Check
- Load the session by ID under RLS.
- Reject if status is not active.
Zero-Trust Expiry Check
- Use server time only:
```
if session.ExpiresAt.Valid && now.After(session.ExpiresAt.Add(GracePeriod)) {
    return 403, "Session has expired"
}
```
- GracePeriod is a small server-side constant (e.g. 5–30 seconds) to tolerate minor clock skew and network jitter.
Liveness Update
- Set last_active_at = NOW() in the same transaction as the answer write.
Persist Answer & Forensics
- Upsert submission_items for (session_id, question_version_id, item_pos):
  - ensure this remains idempotent using the existing unique constraint.
- Store:
  - server-side timestamps (created_at / updated_at),
  - client_timestamp from the payload (if provided).

Important: client_timestamp is never used to override expiry decisions. It is purely for later analysis (offline drift, bulk uploads, etc.).

3.2 Hardening `SubmitSession`

SubmitSession must be safe under retries and races (e.g. user double-click, network flakiness, concurrent auto-close).

Behaviour:

Expiry Check (same as RecordAnswer)
- Reject manual submit if the session is already past expires_at + GracePeriod:
  - optionally treat this as an auto-expire flow instead.
Idempotent Status Transition
- Use a guarded update:
```
UPDATE delivery_sessions
SET status = 'submitted',
    termination_reason = 'user_submit',
    last_active_at = NOW()
WHERE id = $1
  AND status = 'active';
```
- Check the affected row count:
  - If 0, the session was already submitted or auto-expired; surface a friendly response to the client.
Submission Snapshot & Scoring
- If the status change succeeded, run:
  - buildSubmissionSnapshot
  - computeAndPersistMetrics
- Ensure the submission creation pipeline is idempotent:
  - either via a unique constraint on submissions.session_id,
  - or by checking if a submission already exists for that session before inserting.

Note: The same idempotent pattern should be used by the auto-close worker (Phase 4) so user submit and auto-expire cannot both generate submissions.

Phase 4 — Operational Resilience (Background Jobs & Monitoring)

Goal: Automatically clean up stale sessions and express liveness clearly in monitoring.

4.1 The “Reaper” (Auto-Close Worker)

A background worker (cron or internal ticker) that finalises sessions whose deadline has passed.

Query (cheap due to index):

SELECT id
FROM delivery_sessions
WHERE status = 'active'
  AND expires_at IS NOT NULL
  AND expires_at < (NOW() - GracePeriod);

Action (per session ID):

Attempt to transition status:

UPDATE delivery_sessions
SET status = 'submitted',
    termination_reason = 'auto_expired'
WHERE id = $1
  AND status = 'active';

If the row was updated:
- Run the same submission pipeline as SubmitSession:
  - snapshot + scoring.
If no row was updated:
- Another actor has already closed or submitted the session; do nothing.

Invariant: Submission creation must remain idempotent with respect to session_id so that manual submit and the Reaper can race safely.

4.2 Liveness Indicators in the Command Centre

The Assignment Monitor (Pillar 3) can now use last_active_at to visualise liveness:

Green: last_active_at > NOW() - 30s (very recent activity)
Amber: last_active_at > NOW() - 5m
Grey (Zombie): last_active_at <= NOW() - 5m while status = 'active'

This does not change any backend invariants but provides:

realistic admin expectations (“this user is idle, not necessarily still present”), and
a foundation for future alerts or interventions (e.g. admin force-close).

Phase 5 — High-Stakes Delivery Foundation (Schema Only)

Goal: Ensure the schema supports strict constraints for high-stakes delivery:

section-level timing,
linear navigation (“no going back”),
secure / locked-down browser delivery,

without needing disruptive schema changes later. UI and full enforcement logic can follow in later iterations.

5.1 Section State Tracking (`delivery_session_section_states`)

High-stakes exams often require per-section timing (e.g. Section 1: 20 minutes, Section 2: 20 minutes), where unused time from one section cannot be carried over.

Configuration (The Law):

Section time limits live in the evaluation configuration, e.g. in the evaluation JSON / snapshot:
- sections: [{ id: "s1", time_limit_seconds: 1200 }, ...]

State (The Reality):

We track per-section timing in a child table:

-- Tracks the state and timing of specific sections within a session
CREATE TABLE delivery_session_section_states (
    session_id UUID NOT NULL REFERENCES delivery_sessions(id) ON DELETE CASCADE,
    section_id TEXT NOT NULL,  -- Matches the section ID in the evaluation snapshot
    
    status TEXT DEFAULT 'locked', -- 'locked', 'open', 'completed'
    started_at TIMESTAMPTZ,
    expires_at TIMESTAMPTZ,       -- Calculated at the moment the section is opened
    
    PRIMARY KEY (session_id, section_id)
);

Enforcement (The Judge) – future logic:

When a section is opened:
- backend creates/updates the corresponding row with:
  - status = 'open',
  - started_at = NOW(),
  - expires_at = NOW() + effective_section_limit (respecting any accommodations).
When RecordAnswer is called:
- service can check delivery_session_section_states for the relevant section_id:
  - reject if status != 'open' or NOW() > expires_at.

This keeps section timing laws in the evaluation snapshot, but the enforced reality in delivery_session_section_states, consistent with your broader architecture.

Some high-stakes exams require linear navigation (no revisiting previous items) or controlled progression per section.

We introduce navigation state on the session:

ALTER TABLE delivery_sessions 
    ADD COLUMN max_viewed_item_index INT DEFAULT 0,
    ADD COLUMN current_section_id TEXT;

max_viewed_item_index:
- Highest item index the candidate has been allowed to see so far.
- Supports rules such as “you cannot go back behind the furthest point reached”.
current_section_id:
- Optional convenience field to reflect which section is currently open,
- complements delivery_session_section_states for quickly routing and validating navigation.

Future enforcement examples (not implemented in this phase):

GetSession:
- only returns items up to max_viewed_item_index in non-review modes.
RecordAnswer:
- if a request attempts to answer an item index < max_viewed_item_index while linear mode is configured:
  - reject with 403.

The key here is schema readiness; strict linear/section logic can be layered on later.

5.3 Security Configuration for Locked-Down Browsers

For “locked-down browser” scenarios (e.g. Safe Exam Browser – SEB), we separate:

Configuration (The Law):
- what is expected (allowed SEB keys, IP allowlists),
State (The Reality):
- what the backend actually saw during the session.

5.3.1 Assignment Security Configuration

We ensure the assignment can store security and lockdown expectations:

ALTER TABLE assignments 
    ADD COLUMN security_config JSONB; 
    -- Example:
    -- {
    --   "seb_allowed_hashes": ["a7f...", "b2c..."],
    --   "ip_allowlist": ["192.168.0.0/24"],
    --   "require_seb": true
    -- }

This is where we can store:

expected SEB Browser Exam Keys (hashes derived from SEB config),
IP allowlists / ranges,
boolean flags like require_seb.

Future enforcement:

Middleware on sensitive routes (e.g. /api/v1/sessions/**) will:
- read assignments.security_config,
- validate request headers such as X-SafeExamBrowser-RequestHash,
- compare against seb_allowed_hashes,
- reject (403) requests that do not present valid lockdown proofs.

5.3.2 Session Device Fingerprint (Forensics)

We also record what we actually see during delivery for later analysis:

ALTER TABLE delivery_sessions
    ADD COLUMN device_fingerprint JSONB;
    -- Example:
    -- {
    --   "user_agent": "SEB/3.0.1 (Windows 10)",
    --   "ip": "203.0.113.42",
    --   "seb_hash": "a7f...",
    --   "platform": "win32"
    -- }

This supports:
- debugging “why was a session blocked?”,
- downstream analytics (e.g. distribution of platforms and SEB versions),
- future anomaly detection.

Important: Enforcement still happens in middleware / service guards. device_fingerprint is for audit and monitoring; security_config expresses the law.

Summary of Changes

Feature	Old Behaviour	New Resilience Behaviour / Foundation
Deadlines	Calculated on read (joins, JSON, logic)	Computed once and stored in `expires_at`
Time Checks	Relied on client timer	Server-side rejection via `expires_at` + grace
Activity	Binary (`active` / `submitted`)	Includes `last_active_at` liveness tracking
Completion	Ambiguous cause	Explicit `termination_reason` (why it ended)
Cheating View	No visibility	`client_timestamp` for offline / bulk-upload forensics
Concurrency	Implicit, best-effort	Idempotent transitions & submission creation
Section Timing	Only global session timing possible	Per-section timing via `delivery_session_section_states`
Linear Nav	No explicit navigation state	`max_viewed_item_index` / `current_section_id` schema in place
Lockdown Browser	No structured config or device evidence	`assignments.security_config` + `device_fingerprint` for SEB-style enforcement

This roadmap converts the Delivery Engine from a primarily passive storage layer into an active enforcement engine, capable of supporting:

high concurrency,
untrusted clients,
lossy network conditions,
credible audits, and
future high-stakes features (section timing, linear navigation, locked-down browsers),

and forms the required foundation for Programmes & Certifications (Pillar 5).

0. Scope & Assumptions​

1. Objectives​

Phase 1 — Schema Hardening​

1.1 delivery_sessions Table Upgrades​

1.2 submission_items Forensics​

Phase 2 — The Single Source of Truth (Service Layer)​

2.1 CalculateSessionExpiry (Domain Function)​

Phase 3 — Enforcement (Guardrails in Handlers)​

3.1 Hardening RecordAnswer (PUT /sessions/\{id\}/answers)​

3.2 Hardening SubmitSession​

Phase 4 — Operational Resilience (Background Jobs & Monitoring)​

4.1 The “Reaper” (Auto-Close Worker)​

4.2 Liveness Indicators in the Command Centre​

Phase 5 — High-Stakes Delivery Foundation (Schema Only)​

5.1 Section State Tracking (delivery_session_section_states)​

5.2 Navigation State for Linear Delivery​

5.3 Security Configuration for Locked-Down Browsers​

5.3.1 Assignment Security Configuration​

5.3.2 Session Device Fingerprint (Forensics)​

Summary of Changes​