Skip to main content

ADR-0013: First-Class Question Types via Plugins + Core Attempt Facts + Typed Fact Panels

Status

Proposed

Date

2025-12-31

Decision Drivers

  • Support many question types (QTI-style interactions) without “second-class citizens.”
  • Preserve current invariants:
    • snapshot-canonical, deterministic reporting (no live authoring joins)
    • remediation-safe recomputation (apply/revert) with auditability
    • reporting read-models remain stable and high-performance at scale
  • Question Health must remain meaningful for:
    • autoscorable items (MCQ/MRQ/numeric/match/order/etc.)
    • subjective items (essay/evidence) with workflow + rubric/assessor integrity signals
  • Privacy-by-default, with explicit opt-in for additional response visibility.

Context

Current Question Health is MCQ-shaped:

  • reporting extracts MCQ-specific fields (e.g., a single selected option)
  • “facility/top option” becomes the implied universal health model
  • scoring logic assumes MCQ payload structure

As we add MRQ, numeric, short answer, essay, and evidence types, this coupling will cause:

  • growing switch-case logic across scoring and reporting
  • missing or “empty” analytics for non-MCQ types
  • subjective question types (key differentiator for Evalium) appearing analytically inferior

Decision

We will standardise question-type support around three contracts:

  1. Question Type Plugin Contract for authoring, delivery validation, scoring, and reporting facts emission.
  2. Core Attempt Facts + Typed Fact Tables to power reporting/health without MCQ assumptions.
  3. Question Health Contract that always returns Core Health plus a type-specific Analysis Panel, with a single privacy gate.

This approach is the canonical path for all new question types.


1) Core Attempt Facts (reporting foundation)

1.1 Adopt the existing core facts table

The existing reporting table reporting.report_item_attempts_flat becomes (conceptually) the Core Attempt Facts table. We will evolve it as needed to represent the universal contract.

Key point: We will not introduce a second “core facts” table. This avoids migration churn and dual-source risk.

1.2 Core facts are derived from submissions only

Core facts MUST be generated solely from:

  • submissions
  • submission_items (or equivalent immutable per-item attempt records)

Session data is not required to compute health/analytics.

1.3 Core Attempt Facts Contract (minimum)

Core Attempt Facts must support all qtypes by capturing these fields (names are aligned to existing schema):

  • Identifiers: tenant_id, org_unit_id, evaluation_version_id, question_version_id, submission_id, submission_item_id, qtype
  • Timing: time_on_item_ms, completed_at
  • Omission: is_omitted (+ omit_reason if available)
  • Scoring:
    • score_awarded, max_score (existing)
    • outcome_code (existing; remains supported)
    • score_pct (derived or stored)
    • score_status enum: SCORED | PENDING | INVALID | EXEMPT (new where needed)
    • score_method enum: AUTO | HUMAN | AI_ASSISTED | MODERATED (new where needed; initially AUTO/HUMAN is acceptable)
  • Lineage / versioning:
    • remediation/scoring version fields as already implemented
    • computed_at / projected_at timestamp

Meaning:

  • “Correct/incorrect” is not universal. “Facility” becomes a derived metric for relevant types (e.g., MCQ full-credit rate).
  • For MCQ-derived facility, use respondent facility: correct / (attempts - omitted).
  • PENDING is first-class and must appear in health and reporting.

1.4 Qtype capture is durable

qtype MUST be stored in the core facts at projection time (derived from the submission snapshot), so analytics never require live authoring joins.


2) Typed Fact Tables (per-type analysis without MCQ assumptions)

2.1 Purpose

Typed fact tables support per-type “Analysis Panels” (option analysis, MRQ patterns, rubric distributions, evidence integrity, etc.) without forcing those fields into the universal core contract.

2.2 Design rules

  • Additive schema only: typed tables are added without breaking existing reporting.
  • FK-free read-models by design: reporting tables MUST NOT add FKs to OLTP tables.
    • Existing reporting FKs will be removed as part of the rollout to keep lock-order consistent.
  • Keying and join strategy:
    • typed facts are joined to core facts by tenant_id + submission_item_id (preferred) or an equivalent immutable attempt key.
    • typed facts may duplicate lightweight identifiers (tenant_id, org_unit_id, question_version_id, completed_at) to enable index-only access.

2.3 Typed fact families (initial set)

Names are illustrative; final names must follow repo conventions.

Choice family (MCQ single + MRQ)

  • report_item_attempt_choice
    • tenant_id, submission_item_id, question_version_id, completed_at
    • selected_option_ids (array)
    • response_pattern_hash (optional, supports MRQ pattern counts without storing raw patterns)
    • is_full_credit (derived)
    • is_partial_credit (derived)

Numeric family

  • report_item_attempt_numeric
    • numeric_value (decimal)
    • unit (optional)
    • within_tolerance (derived)
    • tolerance_used (optional structured json)

Short answer / text entry family

  • report_item_attempt_text
    • normalized_text (nullable; visibility policy dependent)
    • raw_text_hash (enables counting without disclosure)
    • format_error_code (optional categorisation)

Rubric / constructed response family

  • report_item_attempt_rubric
    • grade_status (PENDING|IN_PROGRESS|FINAL)
    • grader_id (nullable)
    • dimension_scores (jsonb; strictly versioned shape)
    • moderated (optional)

Evidence family

  • report_item_attempt_evidence
    • attachment_count
    • attachment_types (array)
    • missing_required_metadata (bool)
    • integrity_flags (optional structured json)

3) Question Type Plugin Contract

3.1 Responsibilities per qtype

Each question type is implemented as a plugin that defines:

  • authoring schema + validation
    • validate question payload and constraints
    • validate scoring key / rubric definition where applicable
  • Answer schema + validation
    • validate candidate answer payload (shape + constraints)
  • Scoring
    • produce scoring results aligned to existing submission fields, plus minimal additions (status/method)
  • Typed facts emission
    • emit the typed fact payload(s) needed for Analysis Panels
  • Privacy classification
    • declare which emitted facts are sensitive (e.g., raw text, exemplars)

3.2 Registry and determinism

  • Plugins are wired via a constructor-injected registry (no global map, no reflection).
  • Unknown qtype handling is deterministic:
    • scoring yields score_status = INVALID with explicit error codes
    • typed facts emission is skipped (or emits a minimal invalid marker)

4) ScoringResult Semantics (align with existing storage)

We standardise scoring semantics without duplicating existing stored fields.

4.1 Stored fields alignment

Evalium already stores score, max_score, outcome_code, and remediation versioning. We will:

  • keep these as the canonical stored score outputs
  • add score_status and score_method where needed to support first-class subjective workflows and reporting parity

Defaults:

  • autoscorable types: score_status = SCORED, score_method = AUTO
  • human-graded types: score_status = PENDING, score_method = HUMAN until finalised

4.2 Required semantics

  • score_status distinguishes SCORED vs PENDING vs INVALID vs EXEMPT
  • score_method indicates AUTO vs HUMAN (others optional future)
  • Type-specific scoring audit details remain in structured details (where needed) but MUST NOT require live joins.

5) Question Health Contract (API)

5.1 Core Health is universal

Question Health endpoints MUST return a core block derived from Core Attempt Facts:

Minimum core fields:

  • attempts, omitted, omitRate
  • timing stats (avg + p50/p90 where available)
  • scoring stats: meanScore, meanScorePct, distribution/histogram (incremental)
  • statusCounts: SCORED/PENDING/INVALID/EXEMPT
  • trends: scorePct 7d/30d
  • coverage: evaluationsCount, recentRuns (where applicable)
  • health badge: status, confidence, topReasons
  • lastComputedAt

5.2 Analysis Panels are typed and additive

Question Health also returns analysis with a type panel when typed facts exist:

  • MCQ single: optionCounts/optionPct/topOption + facilityPct (full-credit rate)
  • MRQ: option inclusion + top response patterns + partial credit histogram
  • Numeric: answer distribution + within tolerance %
  • Short answer: top wrong clusters + format error rates (visibility policy applies)
  • Essay (rubric): grading pipeline + rubric dimension distributions + drift indicators
  • Evidence: integrity + review pipeline signals

5.3 Backward compatibility

Initial rollout is non-breaking:

  • existing MCQ top-level fields may remain
  • core and analysis are added A later /v2 endpoint may remove legacy MCQ-shaped top-level fields after migration.

6) Visibility & Privacy Policy (single enforcement gate)

6.1 Policy levels (org default, evaluation override)

  • Level 0 (default): no raw response content in analytics (counts/hashes/clusters only)
  • Level 1: sanitised/normalised strings may be shown (truncated; still protected)
  • Level 2: raw exemplars/top raw values may be shown (explicit opt-in)

6.2 Single gate enforcement

Privacy filtering and eligibility checks MUST be enforced in the reporting service layer (one gate), and reused by:

  • UI endpoints
  • CSV/export endpoints

6.3 Small-number suppression

By default, sensitive breakdowns require:

  • attempts >= N
  • per-entry count >= k Configurable thresholds may exist, but the enforcement path is centralised.

7) Remediation integration and idempotency

7.1 Recompute scope

Remediation-driven recomputation MUST update:

  • core attempt facts (score/max/outcome/status/method)
  • typed facts that depend on scoring (e.g., full-credit/partial-credit flags)

7.2 Idempotency key

Projection updates must be idempotent keyed by:

  • tenant_id + submission_item_id (preferred)

The recomputation job must be safe to replay.


8) Indexing plan (minimum expectations)

All typed fact tables MUST include indexes supporting the existing access patterns:

  • by tenant_id + org_unit_id + question_version_id + completed_at (time windows)
  • by tenant_id + submission_item_id (joins to core facts)
  • additional indexes as required for high-cardinality aggregates (e.g., option id arrays may use GIN where appropriate)

Index selection must not violate reporting read-model isolation rules.


9) First-Class Question Type Definition of Done (DoD)

A new qtype cannot ship unless it provides:

  1. authoring + validation (payload schema enforced)
  2. Answer schema + validation
  3. Scoring semantics:
    • produces canonical score/max/outcome
    • sets score_status and score_method
  4. Core Health support:
    • core attempt facts emitted
  5. Type Analysis support:
    • at least one meaningful analysis metric/panel emitted
  6. Privacy classification + enforcement:
    • policy applied consistently to UI and CSV/export
  7. Tests:
    • projector smoke test for the qtype (seed → submissions → projections → health endpoints)
    • regression tests ensuring MCQ health remains stable

Consequences

Positive

  • Eliminates the “MCQ-shaped health” trap: all qtypes get meaningful Core Health.
  • Type-specific analysis scales by adding typed fact tables + plugins, not switch statements.
  • Subjective items (essay/evidence) gain strong operational analytics (pipeline + rubric/integrity).
  • Privacy-by-default is enforced centrally and consistently across exports.

Tradeoffs

  • Additional typed tables and projector work per qtype.
  • Requires disciplined DoD enforcement; partial implementations are not allowed to ship.

Rollout Plan

  1. Refactor MCQ scoring to emit score_status/score_method and validate unknown qtype deterministically.
  2. Evolve report_item_attempts_flat to meet Core Attempt Facts contract (additive columns only).
  3. Introduce report_item_attempt_choice typed table and migrate option analysis to read from it.
  4. Update Question Health endpoints to return core + analysis.choice while preserving current MCQ fields.
  5. Add backfill/replay logic for typed tables (projection replay job), keyed by tenant_id + submission_item_id.
  6. Implement MRQ + short answer + essay + evidence plugins, each shipping with:
    • core facts emission
    • typed facts emission
    • analysis panel metrics
    • privacy gate enforcement coverage
  7. After clients migrate, introduce /v2 endpoints and retire legacy MCQ-shaped top-level fields.

Notes

  • This ADR defines the canonical contracts. Concrete SQL names and minor field naming may vary, but the invariants MUST hold:
    • report_item_attempts_flat is the Core Attempt Facts source
    • typed fact tables are FK-free reporting read-models
    • health API is core + analysis with a single privacy gate
    • remediation recomputes core + dependent typed facts idempotently