Skip to main content

Smart Paste — Feasibility & Updates Addendum v1

Purpose: Capture changes/updates after feasibility review (sandboxed DOM extraction + Ohm) and fold them into the Smart Paste & Import plan without rewriting the whole spec.

Companion doc: “Smart Paste & Import — Evalium Spec v1”


1) What changed (Executive Summary)

  • Feasibility: GREEN. We will adopt a Sandboxed DOM Extraction approach for HTML pastes (Excel/Word) and a compact Ohm grammar for structured text (MCQ v1), with safe fallbacks. This reframes Smart Paste as an integration task rather than research.
  • Terminology: authoring feature is now “Paste → Table Block (HTML‑rendered)” to reflect Block JSON storage and semantic HTML rendering.
  • Scope clarifications: A single pipeline powers Tables, Users/Groups/Assignments, and Bulk Question Import (MCQ); each reuses the same detection → mapping → validation → dry‑run → commit pattern.

2) Architectural updates

2.1 HTML Paste (Excel/Word) — SandboxDOMExtractor

  • Default path: Parse using DOMParser into a detached Document (preferred, simplest).
  • Fallback path: Use an <iframe sandbox srcdoc="..."> to let the browser repair malformed Office HTML in isolation.
  • Never inject raw pasted HTML. Extract a grid (cells with value + rowspan/colspan), then map to a Table Block (Block JSON).
  • A11y enrichments: Prompt for caption and header semantics; inject <th scope> metadata in the Block (renderer emits semantic HTML).
  • Brand & responsiveness: No inline styles; tokens for color/spacing; horizontal scroll + sticky header; optional stacked mode for narrow screens.

2.2 Text/CSV/Markdown & MCQ — Ohm parsers

  • CSV/TSV/Markdown: Keep delimiter detection; prefer stable CSV/TSV path.
  • MCQ v1 Ohm grammar: Lightweight grammar for rows with stem, option_1..n, correct (index or label). If confidence < threshold, fall back to manual Column Mapping.
  • Versioning discipline: Bulk MCQ creates new question_versions (draft) and never mutates published versions; optional assembly into a new evaluation draft.

2.3 Normalizers & Mappers

  • GridNormalizer: flattens spans, trims empty rows/cols, infers header hints.
  • RowNormalizer: trims cells; soft type inference (email/date/number/enum/boolean).
  • Mappers:
    • Grid → Table Block (Block JSON).
    • Rows → User/Group/Assignment DTOs or DraftQuestion DTOs (MCQ).

3) Guardrails (must‑haves)

  • Security: Strict allow‑list sanitisation; no event handlers, no classes/styles; CSP enforced; iframe sandbox when used.
  • Limits: Size caps (cells/rows/chars); above thresholds, route to file import (same mapping UI).
  • Privacy: Telemetry logs shapes/counts only (no raw PII or stems/options text).
  • RLS/Capabilities: FE never authorises; backend enforces with transactions; emit partial‑failure reports.

4) Updated Acceptance Criteria (add to base spec)

4.1 Paste → Table Block (authoring)

  • Pasting an Excel/Word table with merged cells produces a Table Block with correct structure, caption, and header scopes; rendered output is semantic HTML with no inline styles and passes the a11y lint.
  • Oversized tables are blocked with a clear message and a CTA to use file import.
  • Preview shows Original vs Cleaned; Undo works.

4.2 Data Importer (Users/Groups/Assignments)

  • Paste → auto‑detect → mapping (with remembered presets) → validation → dry‑run shows Create/Update/Conflict counts and row‑level reasons; commit applies in one transaction.
  • Duplicates and policy errors (e.g., invalid attempt_limit, bad UTC date) are flagged pre‑commit with actionable fixes.

4.3 Bulk Question Import (MCQ v1)

  • Paste of rows with stem, option_1..n, correct validates; dry‑run displays per‑row issues (missing stem, duplicate correct, too few options).
  • Commit creates draft question_versions only; no mutation of published content. Optional “assemble into evaluation draft” succeeds and returns the draft ID.

5) Test Matrix (expand)

Paste sources: Excel (Win/Mac), Google Sheets, Word/Google Docs tables.
Shapes: single‑row header; two‑row headers; row/col spans; internal blanks; wide tables; Unicode and RTL samples.
MCQ variants: correct as index vs label; options A–D vs 1–4; extra ignored columns; quotes/commas in stems/options.
Browsers: Chrome, Edge, Firefox, Safari (latest 2 versions) on Win/Mac.


6) Telemetry & Observability (detail)

  • Client events: paste_detected (format, rows, cols), auto_map_confidence, validation_failed (counts), dry_run_applied.
  • Server metrics: batch size, duration, success/fail counts, top error reasons; CSP report ingestion.
  • Dashboards: adoption (% tenants using Smart Paste), time‑to‑publish with tables, importer error funnels.

7) Go/No‑Go Checklist (engineering ready)

  • SandboxDOMExtractor handles “warzone” samples; DOMParser preferred path passes fixtures.
  • OhmMCQ parses reference sets and falls back cleanly to manual mapping when confidence < threshold.
  • Renderer emits semantic HTML; a11y lint passes; sticky headers work; mobile overflow/stacked mode verified.
  • Transactions + RLS enforced for all imports; partial failures returned with row refs; telemetry is PII‑safe.

8) Open Decisions (for ADRs)

  • Exact caps (cells/rows/chars) per plan tier; defaults for free vs paid.
  • Confidence thresholds for OhmMCQ before fallback to manual mapping.
  • Whether to allow per‑tenant shared presets for column mappings (default yes).

9) Roll‑in Plan

  • Merge this addendum into the base spec as Sections: “Feasibility & Legacy Reuse,” “Architectural Updates,” “Updated Acceptance Criteria,” and “Test Matrix.”
  • Create fixtures repo: /fixtures/paste/html/, /fixtures/paste/csv/, /fixtures/paste/mcq/ with golden outputs for CI.

10) Definition of Done (delta)

  • All new acceptance criteria in §4 pass in CI with fixtures.
  • CSP report-only trial shows zero violations in target paths; then enforced.
  • Paste telemetry appears in dashboards; no PII present.