Skip to main content

📗 UX-INSIGHTS.md

This document specifies the UX patterns for the Insights category. This is the "Durable Intelligence" layer of Evalium, where raw assessment outcomes are transformed into audit-proof reporting, skill projections, and high-stakes corrections.

Insights UX must prioritise provenance, interpretability, and defensibility so that every score and skill level can be explained.


Capability Baseline (Validated 2026-02-25)

This Insights plan spans both currently live backend capability and future-facing UX targets.

Backend-live now:

  • Submission-centric insights: koeStatus + dual-time proofReadiness (defensibleAtExecution, readyNow) on submission detail/list surfaces.
  • Defensibility exceptions queue with triage on submission, engagement, and programme-requirement lenses (including refresh + suppression semantics).
  • Ledger-derived reporting projections (summary/health/range), snapshot-aware submission retrieval, and remediation ledger flows.

Not backend-ready yet (feature-flag or park in UX):

  • Skills inference projections and competence profiles.
  • Deterministic skills explainability backed by persisted skill_evidence_facts provenance facts.
  • Skills recalculation/backfill operator jobs and progress surfaces.

Related backend gap to account for in UX:

  • Explicit first-class proctor command endpoints (pause / resume / terminate) are still pending; model proctor history as event/timeline data when present.

0. Insights Doctrine (MUST)

0.1 Durable over Ephemeral

Insights must only use durable truth:

  • submissions (+ submission_items where needed)
  • version_snapshot (the “what the candidate saw” contract)
  • skill_evidence_facts (durable skill contribution facts, once skills inference is enabled)
  • read models / projections derived from the above

Runtime session data is excluded from Insights to ensure reports survive retention cleanup.

Every derived value (skill level, pass/fail outcome, corrected score, rollup metric) must provide an Explain affordance that links to one of:

  1. Evidence Basis (facts/submissions that contributed)
  2. Snapshot Scope (evaluation/rubric/mapping version used)
  3. Calculation Summary (human-readable “how computed”, not code/SQL)

0.3 Version Awareness

All Insights surfaces must show:

  • Scope indicators (evaluation version, mapping set version, framework version, cohort/run label where relevant)
  • Freshness indicators (last updated timestamp)
  • Processing indicators (queued/running/completed/failed), if a worker pipeline is involved

0.4 Projection Lag & Data States

Insights data can be in one of the following states:

  • Fresh: updated recently; no job pending
  • Processing: a known job is queued/running
  • Stale: no job running, but freshness exceeds a threshold OR refresh failed
  • Failed: job failed; actionable error text shown with a retry path (capability gated)

The UI must visibly distinguish Processing vs Stale.

0.5 Insights Interface Guardrails (MUST)

  • Every derived value must provide an Explain path or an explicit, non-ambiguous unavailable reason.
  • Report blocks must always include scope chips, freshness, and processing state where applicable.
  • Small-sample metrics must render Insufficient data instead of inferred confidence.
  • Charts require accessible equivalents (table/text summaries) and keyboard operability.
  • Status/readiness language must use canonical labels; raw backend enums must not be primary UI text.
  • Mobile/iPad supports read, filter, and explain inspection; complex report configuration may be desktop-optimized with explicit handoff.

1. Core Insights Primitives (Reusable Patterns)

1.1 Insight Block (Dashboard Card)

All dashboards are composed of modular blocks.

Block MUST include

  • Title + short description (“what this represents”)
  • Scope chips (version/timeframe/cohort)
  • Freshness indicator
  • Explain link (Evidence Basis / Calculation / Snapshot Scope)
  • Drilldown CTA

1.2 Scope Chips (MUST)

Any derived view must show applicable scope chips. Common chips:

  • Evaluation Version (e.g., “Eval v3”)
  • Mapping Set Version (e.g., “Mapping v7”)
  • Framework Version (if applicable)
  • Run Label / Cohort (e.g., “Compliance_Q1_2025”)
  • Time window (e.g., “Last 30 days”)

Only show chips backed by current data contracts for that surface (e.g., mapping/framework chips are hidden until skills projections are live).

1.3 Freshness & Processing Banner (MUST)

A consistent banner pattern used across Insights pages:

  • Updated: “Updated 2m ago”
  • Processing: “Processing updates…”
  • Stale: “Data may be stale — last updated 12m ago”
  • Failed: “Update failed — retry” (capability gated)

1.4 Evidence Basis Drawer (MUST)

A standard drawer used by all Explain links that point to evidence.

Drawer MUST include

  • Clear title (“Evidence Basis: [Skill]” / “Evidence Basis: Pass rate”)
  • Scope chips (mapping/eval/version/timeframe)
  • Evidence list (facts or submissions) in a timeline
  • Links to underlying Attempt Viewer(s)
  • “Open full page” escape hatch where appropriate

2. Competence Profiles (Skill Projections, Planned/Parked)

This section is a target-state contract, not a current implementation contract. Do not ship this surface until backend persists deterministic skill provenance facts (mapping set version, rule IDs, source term IDs) and exposes explain-ready read models.

The Competence Profile is a view-layer projection of a user's abilities based on the KOE model:

  • K Knowledge (tests)
  • O Observation (assessor judgements)
  • E Evidence (uploads / artefacts)

The default experience is heatmap-first to show strengths and weaknesses at a glance.


2.1 The Profile Hub (Heatmap-First)

Primary View: Skill Heatmap (MUST)

A grid that visualises strengths/weaknesses defensibly.

Layout

  • Skills grouped by framework/category (collapsible sections)
  • Each skill has a heatmap cell representing attainment
  • A secondary overlay represents confidence/coverage (evidence sufficiency)

Heatmap Cell Semantics

  • Attainment (strength/weakness) is the primary dimension.
  • Confidence/Coverage is separate and must not be confused with attainment:
    • Use a compact overlay pattern (e.g., mini meter, dot density, or badge)
    • Tooltip shows explicit counts: “Evidence: 3 of 5 required” (or equivalent rule)

Status Chip per Skill Row

  • Not Started
  • In Progress
  • Achieved

KOE Mix Hint (SHOULD) A tiny indicator shows evidence composition:

  • “K:2 O:1 E:0” (or icons) This helps users understand why confidence is low without opening the drawer.

Last Evidence Date

  • Shows the most recent contributing evidence date
  • Click opens Evidence Basis drawer focused on “most recent”

Heatmap Controls (SMB-Simple) (MUST)

  • Framework selector (if multiple)
  • Filter: “Needs attention”
    • Low attainment OR low evidence coverage
  • Sort:
    • Lowest attainment first
    • Lowest coverage first

Strengths & Gaps Summary (SHOULD)

Above the heatmap:

  • “Top strengths” list
  • “Key gaps” list

Defensibility rule If coverage is low, label the gap as:

  • “Low evidence” rather than “Weak”

Each list item is clickable → Evidence Basis drawer.


2.2 Evidence Basis Drill-down Drawer (MUST)

Clicking any skill cell opens a drawer revealing the “why” behind the rating.

Drawer content

  • Evidence Timeline: list of skill_evidence_facts in chronological order
  • Source labels: K / O / E (with human-friendly names)
  • Contribution summary: what each fact contributed (e.g., “+1 achieved”, “meets threshold”, “partial”)
  • Mapping Provenance:
    • Mapping Set Version chip
    • Rule name/identifier (human-readable)
    • Optional “Calculation summary” link (what the rule does)

Links

  • Each evidence item links to the Attempt Viewer (Submission) where applicable
  • Mapping rule detail opens a small “Rule detail” drawer (readable explanation, not raw logic)

2.3 Profile Freshness & Processing (MUST)

Competence Profiles must show:

  • Updated time
  • Processing state if recalculation jobs are queued/running
  • Stale badge if data is old and no job is running

3. Reporting & Analytics

Reporting provides version-scoped summaries of performance and operational outcomes.


3.1 Blocks-Based Dashboards (MUST)

Dashboards are composed of modular Insight Blocks. Recommended baseline blocks:

  • Hero Block
    • Pass rate, average score, completion volume
  • Funnel Block
    • Assignment → Started → Submitted → Graded (if subjective grading exists)
  • Distribution Block
    • Score histogram / bands for difficulty detection
  • Item Health Block
    • Flags items with poor discrimination / unexpected patterns
    • Includes data sufficiency guardrails

Data Sufficiency Guardrails (MUST)

If sample size is low, blocks must degrade safely:

  • Show “Insufficient data” (with threshold) instead of presenting fragile metrics as truth
  • Explain link clarifies why the block is limited

Filters (MUST)

Common filters should be consistent across dashboards:

  • Evaluation / Programme
  • Version (see 3.2)
  • Run label / cohort
  • Time window
  • Org unit (hidden if single org unit tenant)

3.2 Version-Scoped Summaries (MUST)

Because content is immutable and attempts are snapshot-based, reporting must make version scope explicit.

UI pattern

  • Version picker defaults to Latest Published
  • Allows:
    • switch to prior versions
    • compare (optional) across versions (Phase 3+)

Version scope chips must appear on every block.

Provisional Badge (MUST)

If subjective grading is pending for the selected cohort/scope:

  • Label results as Provisional
  • Explain link opens “What’s pending” drawer (e.g., count of submissions awaiting subjective marking)

3.3 Drilldowns (MUST)

Every block’s drilldown must land on a consistent “Detail Report” page:

  • Table + filters
  • Scope chips + freshness
  • Explain affordances per row (where applicable)
  • Links to Attempt Viewer for per-candidate analysis (capability gated)

3.4 Proof Readiness & Exceptions (MUST)

Insights must include a first-class operator view of defensibility readiness, not just scores.

Minimum contract:

  • Show proofReadiness dual-time states:
    • defensibleAtExecution
    • readyNow
  • Show stable reason codes and policy refs used in computation.
  • Present KOE status alongside readiness so operators can see why action is needed.

Exceptions queue behavior:

  • Queue rows include lens + subject identifiers and reason summary.
  • Triage metadata is explicit: state, owner, firstSeenAt, lastSeenAt, suppressedUntil.
  • Support lens-specific views:
    • submission
    • engagement rollup
    • programme requirement rollup
  • Default list should hide suppressed items unless explicitly requested.

Status label contract:

  • readiness labels in UI are Ready, Needs review, Blocked (mapped from backend enums).
  • triage labels in UI are Open, Acknowledged, Resolved (mapped from backend enums).
  • do not mix alternate wording (for example Action needed) inside readiness/triage workflows.

Explainability rule:

  • Readiness/KOE summaries are derived projections; execution truth remains in ledger/snapshots.
  • External glass-box style surfaces must capability-redact detailed reasons by default.

4. Attempt Viewer (Submissions)

The Attempt Viewer is the standard interface for reviewing a specific candidate’s result. It is the canonical truth page for an attempt.


4.1 Historical Integrity (MUST)

Snapshot View (MUST)

The UI renders the evaluation exactly as it was at the time of the attempt, using the version_snapshot.

  • “Snapshot Scope” is visible as chips (Eval vX, rubric version, etc.)
  • “View snapshot detail” opens a drawer (read-only)

Activity Ledger (MUST)

Displays the durable history of the attempt:

  • timestamps (started/completed)
  • duration
  • completion method/reason (where available)
  • proctor notes/events (explicit command workflow is backend-pending)
  • overrides applied (effective limits shown)
  • remediation applied (links to batch)

4.2 Score & Corrections (Ledger UX) (MUST)

Scores are append-only and may be corrected.

UI requirements

  • Show the current score/outcome
  • If corrected, show:
    • “Corrected” badge
    • Original → Current
    • Link to remediation batch + reason

Score Version Strip (SHOULD) A compact timeline:

  • v1 (original) → v2 (corrected) → … Clicking a version shows “what changed” in a drawer.

4.3 Feedback Rendering

The UI respects feedback_mode defined in the snapshot:

  • none / overall / tags / items

SMB principle

  • Candidates see only what is configured
  • Admins always see full diagnostics (tags + item-level) where permitted

5. Results Remediation (Correcting Truth)

Remediation allows authors to fix unfair scoring without deleting data.


5.1 Correction Batch Wizard

Scope First (MUST)

The wizard begins with explicit scope selection:

  • Evaluation (and version)
  • Run label / cohort (optional but recommended)
  • Date range (optional) This prevents accidental broad corrections.

Rule Builder (MUST)

Define correction actions:

  • Mark correct
  • Drop item
  • Replace key (Extensible later)

Impact Preview (SHOULD)

Before applying, show a dry-run summary:

  • “X candidates move from Fail → Pass”
  • “Average score changes 72% → 75%”
  • “Y submissions affected”

If not available yet, show a placeholder state:

  • “Impact preview not available for this batch” (with rationale)

5.2 Remediation Ledger (MUST)

Every batch creates a permanent record.

UI requirements

  • Mandatory Reason
  • Clear status: queued / applying / applied / failed
  • Link to affected submissions list

Idempotency Feedback (MUST)

If re-applied:

  • “No changes needed — already applied”
  • Provide link to prior application record

Revert Visibility (SHOULD)

If revert/compensating corrections exist:

  • Provide a “Revert” action (capability gated)
  • Revert must create its own ledger record and reason

6. Skills Recalculation (Backfills, Planned/Parked)

When Skill Mapping Sets change, SMBs can re-project historical data.

This remains parked until skills projection/provenance pipelines are backend-mature.


6.1 Recalculation Job UI

Scope Selection (MUST)

Choose recalculation scope:

  • Org unit
  • Evaluation
  • Date range
  • Run label (optional)

Processing State (MUST)

Show job states consistently:

  • queued / running / completed / failed

Progress Tracker (MUST)

  • Progress bar + counts processed
  • “Estimated completion” is optional; do not show if unreliable

Result Summary (MUST)

Once complete:

  • Rows affected
  • Skills impacted
  • Link to updated competence profiles
  • Freshness updated timestamp

7. Export & Defensibility Surfaces

7.1 Exports (MUST)

Exports must preserve scope and provenance:

  • Include scope chips in export metadata (version/timeframe/run label)
  • If data is provisional or corrected, exports must label it

Common exports:

  • CSV (tables)
  • PDF (Attempt Viewer / summary report)
  • DSAR inclusion surfaces are governed elsewhere; include competence profiles only when skills projections are active and defensible

7.2 Audit-Proof Reporting (MUST)

Any page that presents “final truth” must make it defensible:

  • Explain links present
  • Ledger visible
  • Version scope explicit
  • Corrections visible with reason

8. Summary of Navigation Contracts

Insights TaskPatternPrimary CTALink Contract (Peek + Full Page)
View Skills (Future)Heatmap/Profile HubRecalculateEvidence Basis (facts + submissions)
Review Defensibility QueueExceptions Table + TriageAcknowledge / Suppress / ResolveSubmission/Engagement/Programme detail
Review ResultAttempt ViewerDownload PDFVersion Snapshot Detail
Fix ScoresRemediation WizardApply BatchAffected Submissions List
Export DataTable + FilterExport CSVContext Drawer (User / Org / Version Scope)