ADR-0013: First-Class Question Types via Plugins + Core Attempt Facts + Typed Fact Panels

Status

Proposed

Date

2025-12-31

Decision Drivers

Support many question types (QTI-style interactions) without “second-class citizens.”
Preserve current invariants:
- snapshot-canonical, deterministic reporting (no live authoring joins)
- remediation-safe recomputation (apply/revert) with auditability
- reporting read-models remain stable and high-performance at scale
Question Health must remain meaningful for:
- autoscorable items (MCQ/MRQ/numeric/match/order/etc.)
- subjective items (essay/evidence) with workflow + rubric/assessor integrity signals
Privacy-by-default, with explicit opt-in for additional response visibility.

Context

Current Question Health is MCQ-shaped:

reporting extracts MCQ-specific fields (e.g., a single selected option)
“facility/top option” becomes the implied universal health model
scoring logic assumes MCQ payload structure

As we add MRQ, numeric, short answer, essay, and evidence types, this coupling will cause:

growing switch-case logic across scoring and reporting
missing or “empty” analytics for non-MCQ types
subjective question types (key differentiator for Evalium) appearing analytically inferior

Decision

We will standardise question-type support around three contracts:

Question Type Plugin Contract for authoring, delivery validation, scoring, and reporting facts emission.
Core Attempt Facts + Typed Fact Tables to power reporting/health without MCQ assumptions.
Question Health Contract that always returns Core Health plus a type-specific Analysis Panel, with a single privacy gate.

This approach is the canonical path for all new question types.

1) Core Attempt Facts (reporting foundation)

1.1 Adopt the existing core facts table

The existing reporting table reporting.report_item_attempts_flat becomes (conceptually) the Core Attempt Facts table. We will evolve it as needed to represent the universal contract.

Key point: We will not introduce a second “core facts” table. This avoids migration churn and dual-source risk.

1.2 Core facts are derived from submissions only

Core facts MUST be generated solely from:

submissions
submission_items (or equivalent immutable per-item attempt records)

Session data is not required to compute health/analytics.

1.3 Core Attempt Facts Contract (minimum)

Core Attempt Facts must support all qtypes by capturing these fields (names are aligned to existing schema):

Identifiers: tenant_id, org_unit_id, evaluation_version_id, question_version_id, submission_id, submission_item_id, qtype
Timing: time_on_item_ms, completed_at
Omission: is_omitted (+ omit_reason if available)
Scoring:
- score_awarded, max_score (existing)
- outcome_code (existing; remains supported)
- score_pct (derived or stored)
- score_status enum: SCORED | PENDING | INVALID | EXEMPT (new where needed)
- score_method enum: AUTO | HUMAN | AI_ASSISTED | MODERATED (new where needed; initially AUTO/HUMAN is acceptable)
Lineage / versioning:
- remediation/scoring version fields as already implemented
- computed_at / projected_at timestamp

Meaning:

“Correct/incorrect” is not universal. “Facility” becomes a derived metric for relevant types (e.g., MCQ full-credit rate).
For MCQ-derived facility, use respondent facility: correct / (attempts - omitted).
PENDING is first-class and must appear in health and reporting.

1.4 Qtype capture is durable

qtype MUST be stored in the core facts at projection time (derived from the submission snapshot), so analytics never require live authoring joins.

2) Typed Fact Tables (per-type analysis without MCQ assumptions)

2.1 Purpose

Typed fact tables support per-type “Analysis Panels” (option analysis, MRQ patterns, rubric distributions, evidence integrity, etc.) without forcing those fields into the universal core contract.

2.2 Design rules

Additive schema only: typed tables are added without breaking existing reporting.
FK-free read-models by design: reporting tables MUST NOT add FKs to OLTP tables.
- Existing reporting FKs will be removed as part of the rollout to keep lock-order consistent.
Keying and join strategy:
- typed facts are joined to core facts by tenant_id + submission_item_id (preferred) or an equivalent immutable attempt key.
- typed facts may duplicate lightweight identifiers (tenant_id, org_unit_id, question_version_id, completed_at) to enable index-only access.

2.3 Typed fact families (initial set)

Names are illustrative; final names must follow repo conventions.

Choice family (MCQ single + MRQ)

report_item_attempt_choice
- tenant_id, submission_item_id, question_version_id, completed_at
- selected_option_ids (array)
- response_pattern_hash (optional, supports MRQ pattern counts without storing raw patterns)
- is_full_credit (derived)
- is_partial_credit (derived)

Numeric family

report_item_attempt_numeric
- numeric_value (decimal)
- unit (optional)
- within_tolerance (derived)
- tolerance_used (optional structured json)

Short answer / text entry family

report_item_attempt_text
- normalized_text (nullable; visibility policy dependent)
- raw_text_hash (enables counting without disclosure)
- format_error_code (optional categorisation)

Rubric / constructed response family

report_item_attempt_rubric
- grade_status (PENDING|IN_PROGRESS|FINAL)
- grader_id (nullable)
- dimension_scores (jsonb; strictly versioned shape)
- moderated (optional)

Evidence family

report_item_attempt_evidence
- attachment_count
- attachment_types (array)
- missing_required_metadata (bool)
- integrity_flags (optional structured json)

3) Question Type Plugin Contract

3.1 Responsibilities per qtype

Each question type is implemented as a plugin that defines:

authoring schema + validation
- validate question payload and constraints
- validate scoring key / rubric definition where applicable
Answer schema + validation
- validate candidate answer payload (shape + constraints)
Scoring
- produce scoring results aligned to existing submission fields, plus minimal additions (status/method)
Typed facts emission
- emit the typed fact payload(s) needed for Analysis Panels
Privacy classification
- declare which emitted facts are sensitive (e.g., raw text, exemplars)

3.2 Registry and determinism

Plugins are wired via a constructor-injected registry (no global map, no reflection).
Unknown qtype handling is deterministic:
- scoring yields score_status = INVALID with explicit error codes
- typed facts emission is skipped (or emits a minimal invalid marker)

4) ScoringResult Semantics (align with existing storage)

We standardise scoring semantics without duplicating existing stored fields.

4.1 Stored fields alignment

Evalium already stores score, max_score, outcome_code, and remediation versioning. We will:

keep these as the canonical stored score outputs
add score_status and score_method where needed to support first-class subjective workflows and reporting parity

Defaults:

autoscorable types: score_status = SCORED, score_method = AUTO
human-graded types: score_status = PENDING, score_method = HUMAN until finalised

4.2 Required semantics

score_status distinguishes SCORED vs PENDING vs INVALID vs EXEMPT
score_method indicates AUTO vs HUMAN (others optional future)
Type-specific scoring audit details remain in structured details (where needed) but MUST NOT require live joins.

5) Question Health Contract (API)

5.1 Core Health is universal

Question Health endpoints MUST return a core block derived from Core Attempt Facts:

Minimum core fields:

attempts, omitted, omitRate
timing stats (avg + p50/p90 where available)
scoring stats: meanScore, meanScorePct, distribution/histogram (incremental)
statusCounts: SCORED/PENDING/INVALID/EXEMPT
trends: scorePct 7d/30d
coverage: evaluationsCount, recentRuns (where applicable)
health badge: status, confidence, topReasons
lastComputedAt

5.2 Analysis Panels are typed and additive

Question Health also returns analysis with a type panel when typed facts exist:

MCQ single: optionCounts/optionPct/topOption + facilityPct (full-credit rate)
MRQ: option inclusion + top response patterns + partial credit histogram
Numeric: answer distribution + within tolerance %
Short answer: top wrong clusters + format error rates (visibility policy applies)
Essay (rubric): grading pipeline + rubric dimension distributions + drift indicators
Evidence: integrity + review pipeline signals

5.3 Backward compatibility

Initial rollout is non-breaking:

existing MCQ top-level fields may remain
core and analysis are added A later /v2 endpoint may remove legacy MCQ-shaped top-level fields after migration.

6) Visibility & Privacy Policy (single enforcement gate)

6.1 Policy levels (org default, evaluation override)

Level 0 (default): no raw response content in analytics (counts/hashes/clusters only)
Level 1: sanitised/normalised strings may be shown (truncated; still protected)
Level 2: raw exemplars/top raw values may be shown (explicit opt-in)

6.2 Single gate enforcement

Privacy filtering and eligibility checks MUST be enforced in the reporting service layer (one gate), and reused by:

UI endpoints
CSV/export endpoints

6.3 Small-number suppression

By default, sensitive breakdowns require:

attempts >= N
per-entry count >= k Configurable thresholds may exist, but the enforcement path is centralised.

7) Remediation integration and idempotency

7.1 Recompute scope

Remediation-driven recomputation MUST update:

core attempt facts (score/max/outcome/status/method)
typed facts that depend on scoring (e.g., full-credit/partial-credit flags)

7.2 Idempotency key

Projection updates must be idempotent keyed by:

tenant_id + submission_item_id (preferred)

The recomputation job must be safe to replay.

8) Indexing plan (minimum expectations)

All typed fact tables MUST include indexes supporting the existing access patterns:

by tenant_id + org_unit_id + question_version_id + completed_at (time windows)
by tenant_id + submission_item_id (joins to core facts)
additional indexes as required for high-cardinality aggregates (e.g., option id arrays may use GIN where appropriate)

Index selection must not violate reporting read-model isolation rules.

9) First-Class Question Type Definition of Done (DoD)

A new qtype cannot ship unless it provides:

authoring + validation (payload schema enforced)
Answer schema + validation
Scoring semantics:
- produces canonical score/max/outcome
- sets score_status and score_method
Core Health support:
- core attempt facts emitted
Type Analysis support:
- at least one meaningful analysis metric/panel emitted
Privacy classification + enforcement:
- policy applied consistently to UI and CSV/export
Tests:
- projector smoke test for the qtype (seed → submissions → projections → health endpoints)
- regression tests ensuring MCQ health remains stable

Consequences

Positive

Eliminates the “MCQ-shaped health” trap: all qtypes get meaningful Core Health.
Type-specific analysis scales by adding typed fact tables + plugins, not switch statements.
Subjective items (essay/evidence) gain strong operational analytics (pipeline + rubric/integrity).
Privacy-by-default is enforced centrally and consistently across exports.

Tradeoffs

Additional typed tables and projector work per qtype.
Requires disciplined DoD enforcement; partial implementations are not allowed to ship.

Rollout Plan

Refactor MCQ scoring to emit score_status/score_method and validate unknown qtype deterministically.
Evolve report_item_attempts_flat to meet Core Attempt Facts contract (additive columns only).
Introduce report_item_attempt_choice typed table and migrate option analysis to read from it.
Update Question Health endpoints to return core + analysis.choice while preserving current MCQ fields.
Add backfill/replay logic for typed tables (projection replay job), keyed by tenant_id + submission_item_id.
Implement MRQ + short answer + essay + evidence plugins, each shipping with:
- core facts emission
- typed facts emission
- analysis panel metrics
- privacy gate enforcement coverage
After clients migrate, introduce /v2 endpoints and retire legacy MCQ-shaped top-level fields.

Notes

This ADR defines the canonical contracts. Concrete SQL names and minor field naming may vary, but the invariants MUST hold:
- report_item_attempts_flat is the Core Attempt Facts source
- typed fact tables are FK-free reporting read-models
- health API is core + analysis with a single privacy gate
- remediation recomputes core + dependent typed facts idempotently

Status​

Date​

Decision Drivers​

Context​

Decision​

1) Core Attempt Facts (reporting foundation)​

1.1 Adopt the existing core facts table​

1.2 Core facts are derived from submissions only​

1.3 Core Attempt Facts Contract (minimum)​

1.4 Qtype capture is durable​

2) Typed Fact Tables (per-type analysis without MCQ assumptions)​

2.1 Purpose​

2.2 Design rules​

2.3 Typed fact families (initial set)​

3) Question Type Plugin Contract​

3.1 Responsibilities per qtype​

3.2 Registry and determinism​

4) ScoringResult Semantics (align with existing storage)​

4.1 Stored fields alignment​

4.2 Required semantics​

5) Question Health Contract (API)​

5.1 Core Health is universal​

5.2 Analysis Panels are typed and additive​

5.3 Backward compatibility​

6) Visibility & Privacy Policy (single enforcement gate)​

6.1 Policy levels (org default, evaluation override)​

6.2 Single gate enforcement​

6.3 Small-number suppression​

7) Remediation integration and idempotency​

7.1 Recompute scope​

7.2 Idempotency key​

8) Indexing plan (minimum expectations)​

9) First-Class Question Type Definition of Done (DoD)​

Consequences​

Positive​

Tradeoffs​

Rollout Plan​

Notes​