Skip to main content

Data Flow Map

This document outlines the primary data flows within the Evalium backend. It provides a high-level overview of how major domains connect and a detailed view of the critical “attempt → durable submission → reporting projection” boundary, including the defensible audit trail.


1. The Big Picture: Core Data Flows

Key contract (non-negotiable):

  • Sessions are runtime state (retention-bound).
  • Submissions (+ submission_items) are the durable source of truth for historical reporting.
  • Evidence is linked durably to submissions and subject to storage tiering.
  • The Ledger (ledger_events) is the immutable audit trail for all state changes.
  • Reporting tables are derived projections (rebuildable from durable sources).
graph TD
subgraph "Audit & Defensibility (Immutable)"
L[Ledger Events]
end

subgraph "authoring & Content (Durable)"
Q[Questions & Passages] --> EV[Evaluation Versions]
T[Taxonomy / Skills] --> Q
end

subgraph "People, Groups & Subjects (Durable)"
U[Users] --> G[Groups]
SBT[Subjects] --> SUB
end

subgraph "Assignment & Policy (Durable)"
EV --> A[Assignments]
U --> A
G --> A
A --> EP[Effective Policy]
end

subgraph "Delivery Runtime (Retention-bound)"
EP --> S[Delivery Session]
S --> SE[Session Events / Telemetry]
S --> SI[Submission Items / Findings]
EVD[Evidence / Media Uploads] --> SI
end

subgraph "Durable Outcomes (Source of Truth)"
S --> SUB[Submission]
SI --> SUB
EVD --> SUB
L -.-> SUB
end

subgraph "Post-Submission Lifecycle (Reactive)"
SUB --> REM[Remediation]
SUB --> CLM[Claims / Disputes]
SUB --> PRG[Programme Progress]
REM -.-> L
CLM -.-> L
end

subgraph "Analytics (Derived / Rebuildable)"
SUB --> R[Reporting Projections]
R --> PRG
end

2. Source of Truth by Domain

  • Audit Truth (Defensibility): ledger_events. Every critical state change (publish, submit, review, remediate) MUST append to the ledger.
  • Content Truth: evaluation_versions, question_versions, taxonomy_terms.
  • Targeting + Policy Truth: assignments + assignment_overrides.
  • Runtime Truth (Ephemeral): delivery_sessions, delivery_session_events (telemetry).
  • Durable Attempt Truth (Historical): submissions, submission_items, submission_subjects, submission_evidence, version_snapshot.
  • Analytics Truth (Derived): reporting.* tables; must be recomputable from durable attempt truth.

3. Detailed Sub-Flow: Defensibility & The Ledger

The Ledger is cross-cutting. Any service mutation that affects a durable record must follow this pattern:

  1. Validate business rules.
  2. Execute the database mutation (within a transaction).
  3. Emit a ledger_event (within the same transaction).
  4. Notify downstream workers (e.g., reporting projection) via event bus or table triggers.

3.1 The Durability Boundary (Finalisation)

When a session is submitted or an observation is finalised:

sequenceDiagram
participant API
participant ResultsService
participant DB_Public
participant Ledger
participant Worker

API->>ResultsService: Submit(session_id)
ResultsService->>DB_Public: BEGIN TRANSACTION
ResultsService->>DB_Public: INSERT submissions (from session runtime)
ResultsService->>DB_Public: UPDATE submission_items SET submission_id
ResultsService->>DB_Public: INSERT submission_evidence (link uploads)
ResultsService->>DB_Public: INSERT submission_subjects (link subjects)
ResultsService->>Ledger: INSERT ledger_events (type: 'submission.finalised')
ResultsService->>DB_Public: COMMIT

ResultsService->>Worker: Trigger Reporting Projection

4. Observational Assessment Flow

Unlike the candidate-led delivery flow, observational assessments are observer-driven:

  1. Subject Selection: Observer selects one or more subjects (users or entities).
  2. Findings Recording: Observer records findings (which map to submission_items).
  3. Evidence Attachment: Observer attaches evidence (photos, documents) to specific findings.
  4. Finalisation: The observer "submits" the observation, creating a submission record with the subjects linked via submission_subjects.

5. Post-Submission Lifecycle (Remediation & Disputes)

Submissions may be modified after finalisation through formal, audited processes:

  • Results Remediation: An admin updates an answer key or drops a flawed question. This triggers a results.remediated ledger event and queues a reporting re-projection.
  • Claims & Disputes: A candidate challenges a result. The workflow (claim -> dispute -> resolution) is recorded in claims and disputes tables and mirrored in the ledger.
  • Programme Orchestration: When a submission is finalised and scored, the programme_orchestrator evaluates if it meets requirements for any enrolled programmes. If so, it updates the programme_enrolment_progress.

6. Retention & Cleanup Rules

  • Runtime Telemetry: delivery_session_events are high-volume and may be purged after 30-90 days.
  • Evidence Tiers: Fresh evidence is in "Hot" storage (S3/MinIO). After a period (e.g., 1 year), the evidence_storage_tier_worker moves it to "Cold" storage (Glacier/Archive) and updates the record.
  • Audit Trail: ledger_events are never purged; they are the permanent record of platform activity.
  • Durable Submissions: Retained according to the organization's retention_policy (configured in Compliance Centre).

7. Practical Implications for Feature Design

  • Idempotency: All mutations at the durability boundary must be idempotent using idempotency_key to handle network retries safely.
  • Visibility Audit: Every time a user views sensitive data (a submission, evidence, or subject), a viewed event should be appended to the ledger (handled by visibility_helpers).
  • Reporting is Recomputable: If the reporting.* tables are corrupted or schema changes, the reporting_projection_worker can be re-run against all submissions to rebuild the analytics state.