Skip to main content

Evaluation Library Roadmap for Evalium (Updated)

This roadmap outlines the phased plan for the Evaluation Library—the hub for creating, organizing, scheduling, and managing assessments. It extends the Question Library’s inline patterns so authors feel at home, while introducing polished UX models for sections, smart buckets, and passages.


MVP (Weeks 1–4): Core Productivity + Randomization

1) Inline Evaluation Editor with Live Preview

  • Function: Expand a row to edit title, sections, buckets, timers, navigation. Side-by-side Candidate Preview renders exactly what candidates will see, including shuffled items and choices.
  • Extras:
    • Seeded preview with “re-roll.”
      • Example seed: seed = hash(evalId, sectionId, candidateId, attemptNo, bucketId)
    • Edit focus → preview focus: preview automatically shows the section you’re editing.

2) Smart Buckets (Foundational)

  • Function: Authors define rules for dynamic question selection:
    • Selection modes: FIXED_SET, RANDOM_K, ALL.
    • Tag filters: pick questions by topic, difficulty, or metadata.
    • Weights or target counts control contribution per bucket.
  • UI: Live counts show how many items will be drawn per bucket; sliders or % inputs; auto-recalc across buckets; inline warnings if a bucket is short.
  • Shortage policy (MVP): per-evaluation option — borrow from other buckets or allow repeats. Default: borrow, no repeats.
  • Determinism: same seed → same draw; different seed or candidate → different draw.

3) Basic Section Settings

  • Sections as ordered containers: pos defines order; each section may contain questions, buckets, or passages.
  • Navigation policy: LINEAR (must complete in order) or FREE_ROAM (jump freely).
  • Optional section timer.
  • Autosave + resume across sections.

4) Passages as Content Hubs (MVP baseline)

  • Definition: A passage is a rich content block (text, tables, images) with its own list of associated questions.
  • authoring in context: Authors create/edit questions directly within the passage editor.
  • Atomic unit: Adding a passage to an evaluation automatically includes its questions as one block.

5) Validation Preflight (MVP scope)

  • Runs on Save/Publish: checks pool sufficiency vs. bucket math, missing media, weights sum to 100%, section timers consistent.
  • Clear, inline warnings with jump-to-fix.

Phase 2 (Months 2–3): Scoring, Outcomes & Smarter Randomization

1) Scoring Designer

  • Overall score with optional per-tag subtotals; weights at both levels.
  • Partial credit toggle (per question type or per tag).
  • Normalization rules (e.g., scale to 0–100).

2) Outcome Rules & Feedback Blocks

  • Visual rule builder:
    • IF (overall ≥ 70%) OR (TagX ≥ 60% AND TagY ≥ 80%) → Outcome “Pass Advanced”.
  • Feedback blocks: messages/resources tied to tags; end-screen composer chooses which to show.

3) Progressive Novelty (Attempt Policy)

  • Repeat strategy: Off / Soft (prefer unseen, allow repeats if needed) / Strict (no repeats until coverage target).
  • Coverage target and reset policy.
  • Least-seen fallback selector and borrowing rules.

4) Conflict Groups (Basic)

  • Tag or group questions that must not co-occur; randomizer respects this at draw time.

5) Passage Flexibility

  • Random passage buckets: “Pick K passages with tags …”.
  • Per-passage rules: include all, fixed questions, or random N questions from within a passage.
  • Warnings: if a passage lacks enough eligible questions, UI shows immediate notice.

Phase 3 (Months 4–5): Power-User & Blueprint Control

1) Smart Feedback Loop

  • Live counts: real-time dry-run API shows how many questions each rule contributes.
  • Auto-normalization: authors can set total target (e.g., 50 Qs); system rebalances sections and rules automatically.
  • Availability warnings: non-blocking, with “shortfall vs. rule” hints.
  • Switch between precise table, visual gallery, and Kanban (by lifecycle).

3) Full-Screen Composition Builder

  • Drag-and-drop playlist model with passages, buckets, and single questions in order.
  • One-click pop-out for complex builds (branching, rich media), with shareable draft links.

4) Advanced Bulk & Keyboard-First Controls

  • Command palette (Ctrl/Cmd+K) for fuzzy actions.
  • Slash commands (/passage, /bucket, /timer) to insert blocks inline.

Phase 4 (Months 6+): Collaboration, Governance & Advanced Analytics

1) In-Context Comments & Review Threads

  • Threaded comments on sections/settings; @mentions; resolve/escalate.

2) Versioning & Branching

  • Auto history; draft branches (A/B, translations); visual diff & merge.

3) Modular Section & Passage Library

  • Save and reuse sections or passage blocks across multiple evaluations.

4) Advanced Passage Analytics

  • Per-passage reporting (difficulty, discrimination, most-drawn questions).
  • Question pick-rate within passage when RANDOM_N is used.

5) Adaptive Randomization (Stretch)

  • Smart draws based on difficulty/IRT-lite.
  • Exposure balancing across candidates.

Accessibility & Author UX (Continuous)

  • Passage editor: drag/drop ordering of questions; inline creation in context.
  • Preview service: candidate preview always shows passage + its linked questions side by side.
  • Keyboard navigation & ARIA labels: sections, timers, outcomes, preview all compliant.
  • Non-blocking validation messages with jump-to-fix.

Minimal Data Model Additions

  • evaluations (id, title, status, tags, …, navigation_policy, settings)
  • evaluation_sections (eval_id, title, pos, timer, layout)
  • evaluation_buckets (section_id, selection_mode, weight/target_count, tag_filter, shortage_policy)
  • passages (id, title, content, tags, …)
  • questions (id, stem, …, passage_id nullable, pos_in_passage)
  • section_items (pos, question_id? bucket_id? passage_id? passage_rule_id?)
  • passage_rules (id, selection_mode, tag_filter, target_count, sub_question_rule)

API Additions

Preview / Compile

Request

POST /api/evaluations/{evaluationId}/preview