Smoke Tests: Coverage, Proof, and Rationale

This document lists the smoke tests executed via backend/tests/test_all_smoke.sh, what each test proves, how it proves it, and why the proof matters.

How to run

All suites: backend/tests/test_all_smoke.sh
Individual suites:
- backend/tests/test_admin_all.sh
- backend/tests/test_authoring_all.sh
- backend/tests/test_engagements_all.sh
- backend/tests/test_delivery_all.sh
- backend/tests/test_remediation_all.sh
- backend/tests/test_compliance_all.sh
- backend/tests/test_reporting_all.sh
- backend/tests/test_defensibility_all.sh
- backend/tests/test_glassbox_all.sh
- backend/tests/test_hashing_all.sh
- backend/tests/test_localisation_all.sh
- backend/tests/test_observations_all.sh

Environment assumptions:

BASE_URL points at the API (default https://localhost:8443/api/v1).
Auth uses backend/tests/auth_login_helper.sh (cookie + CSRF headers).
When DB access is needed, DATABASE_URL must be set.

Admin + Security (`test_admin_all.sh`)

users_crud.sh — Proves: user lifecycle endpoints work; How: create/update/list/delete users; Why: core admin control plane must be stable.
users_import_idempotency.sh — Proves: user import is safe for retry; How: import same users twice with same key; Why: prevents duplicate user creation on network failure.
assignments_import_idempotency.sh — Proves: assignment import is safe for retry; How: import same assignments twice with same key; Why: prevents duplicate assignments.
bulk_import_content_types.sh — Proves: system handles various content types in bulk; How: import mixed question/passage types; Why: verifies bulk ingestion pipeline.
admin_idempotency.sh — Proves: general admin mutations are idempotent; How: repeat POST/PATCH with same key; Why: system stability.
programmes_idempotency.sh — Proves: programme creation is idempotent; How: create programme twice with same key; Why: prevents duplicate programmes.
roles_crud.sh — Proves: roles + capability wiring functions end‑to‑end; How: create/update/list roles; Why: RBAC correctness is foundational for RLS and security.
auth_magic_links.sh — Proves: login/verify flow works with CSRF; How: request token, verify, assert session/CSRF; Why: auth path is a hard dependency for all other tests.
authz_middleware.sh — Proves: capability enforcement denies/permits as expected; How: call endpoints with limited capabilities; Why: prevents privilege escalation.
org_silos.sh — Proves: org‑scoped visibility is enforced; How: seed two orgs and compare list results; Why: tenant/org isolation is a core platform invariant.
user_rbac_schema.sh — Proves: RBAC schema integrity; How: verify capability mappings; Why: ensures security policy matches implementation.

authoring (`test_authoring_all.sh`)

authoring_idempotency.sh — Proves: question/evaluation creation is idempotent; How: repeat creation calls; Why: authoring resilience.
questions_seed.sh — Proves: bulk seeding of questions works; How: seed large batch of questions; Why: supports performance testing and demo data.
questions_import_idempotency.sh — Proves: question import is idempotent; How: import same questions twice; Why: prevents content duplication.
questions_crud.sh — Proves: question + version lifecycle and validation; How: create, version, publish, and invalid payload checks; Why: authoring is the base of all delivery and reporting.
questions_delete_usage_smoke.sh — Proves: deletion is blocked if content is in use; How: attempt to delete question attached to evaluation; Why: prevents data corruption and broken delivery.
passages_crud.sh — Proves: passage entities behave correctly; How: create/list/update/delete; Why: content building blocks must be stable.
passages_import_idempotency.sh — Proves: passage import is idempotent; How: import same passages twice; Why: content integrity.
evaluations_crud.sh — Proves: evaluation lifecycle; How: create/update/list; Why: core delivery container.
evaluations_import_idempotency.sh — Proves: evaluation import is idempotent; How: import same evaluation twice; Why: prevents duplicate evaluations.
evaluations_buckets.sh — Proves: bucket creation + preview logic + cascade delete; How: create bucket, preview, attach, delete; Why: bucket selection drives delivery content.
evaluations_sections.sh — Proves: section and item ordering is correct; How: create sections/items and assert ordering; Why: delivery structure must match author intent.
evaluations_preview.sh — Proves: evaluation preview composition; How: generate preview and validate payload; Why: authoring preview must match snapshot content.
evaluations_validate.sh — Proves: validation errors surface correctly; How: invalid configs return 422; Why: validation is the primary guardrail before publish.
inventory_views_smoke.sh — Proves: inventory listing views work; How: query various inventory filters; Why: authoring UX depends on fast, correct listing.
authoring_bulk_actions.sh — Proves: bulk tag/move/delete operations; How: apply action to multiple items; Why: efficiency for power users.
dependency_usage_smoke.sh — Proves: dependency tracking is correct; How: check usage counts for questions/passages; Why: critical for safe deletion and versioning.
taxonomy_smoke.sh — Proves: taxonomy/tagging system works; How: assign and filter by tags; Why: content organisation.
content_packs.sh — Proves: content pack bundling and export; How: create and export pack; Why: content portability.

Engagements (`test_engagements_all.sh`)

engagements_crud.sh — Proves: engagement container lifecycle + timeline; How: create engagement, append event, read timeline, link assignment; Why: engagement scaffolding is the Phase‑3 foundation for client narrative and later hashing.
engagements_idempotency.sh — Proves: engagement mutations are idempotent; How: repeat engagement creation; Why: ensures engagement integrity.
engagements_programme_linkage.sh — Proves: engagements can be linked to programmes; How: link engagement to programme and verify; Why: supports structured assessment programmes.
engagements_glass_box.sh — Proves: glass box is ledger‑only; How: seed submission + ledger event and fetch /glass-box; Why: client visibility must be read‑only and ledger‑derived.
engagements_ratification.sh — Proves: ratification appends ledger event; How: ratify engagement with step‑up proof and assert glass‑box ledger entry; Why: client sign‑off must be append‑only and defensible.
engagements_hashing.sh — Proves: engagement state hashing appends ledger event; How: hash engagement and assert engagement.state.hashed in glass‑box; Why: hashing is the precursor to milestone proof.

Delivery + Submissions (`test_delivery_all.sh`)

assignments.sh — Proves: assignment creation + listing; How: create assignment and verify retrieval; Why: assignment is the top‑level delivery entry point.
assignments_org_scope.sh — Proves: assignment visibility is org‑scoped; How: list from different orgs; Why: prevents cross‑org leakage.
assignments_redeem.sh — Proves: invitation redeem works with token hashing; How: mint invite, redeem, assert session; Why: onboarding and delivery access control.
assignments_idempotency.sh — Proves: assignment creation is idempotent; How: create same assignment twice; Why: prevents duplicate delivery invitations.
delivery_session_seed.sh — Proves: session seeding for large-scale tests; How: seed multiple sessions; Why: supports scale testing.
delivery_session.sh — Proves: session lifecycle and event recording; How: create session, answer, submit; Why: delivery pipeline correctness.
delivery_session_idempotency.sh — Proves: session events are idempotent; How: submit same answer event twice; Why: prevents double-recording and score corruption.
submission_approval_idempotency.sh — Proves: submission approval is idempotent; How: approve same submission twice; Why: prevents duplicate approval events in ledger.
verification_context_required.sh — Proves: verification context must be provided when configured; How: attempt verify without context; Why: ensures required metadata is captured.
evidence_ledger_events.sh — Proves: evidence attach + approve/reject ledger trail; How: submit evidence, call review endpoints, verify ledger rows; Why: evidence auditability is legally defensible.
evidence_standalone.sh — Proves: standalone evidence upload works; How: upload file and verify storage; Why: supports ad-hoc evidence collection.
evidence_idempotency.sh — Proves: evidence upload is idempotent; How: upload same file with same key; Why: prevents storage bloat and duplicate entries.
evidence_inline.sh — Proves: inline evidence capture works; How: attach evidence during session; Why: seamless candidate experience.
evidence_mixed.sh — Proves: system handles mixed evidence types; How: upload various file types; Why: verifies media handling.
evidence_storage_tier_worker.sh — Proves: evidence archival works; How: trigger archival and verify path change; Why: ensures long-term storage efficiency.
submissions_org_scope.sh — Proves: submissions respect org scope; How: list submissions under different orgs; Why: reporting and access integrity.
delivery_resilience.sh — Proves: reaper/expiry and resilience behaviours; How: simulate timeouts + events; Why: delivery integrity and failure handling.

Remediation (`test_remediation_all.sh`)

results_remediation_drop_item.sh — Proves: dropping an item re‑scores correctly; How: seed submission, apply remediation, verify new score; Why: remediation is audit‑critical.
results_remediation_idempotent.sh — Proves: reapplying same remediation is safe; How: apply twice, assert no drift; Why: deterministic replay safety.
results_remediation_idempotency_key.sh — Proves: remediation respects idempotency keys; How: repeat call with key; Why: avoids duplicate score updates.
results_remediation_multi_attempt.sh — Proves: remediation applies across attempts correctly; How: multiple attempts seeded, verify scopes; Why: multi‑attempt exams are common.
results_remediation_multi_item_mark_correct.sh — Proves: multi‑item corrections update scores; How: mark several items, verify totals; Why: batch corrections must be safe.
results_remediation_passmark_seed.sh — Proves: pass/fail recompute after corrections; How: seed passmark, re‑score; Why: outcome correctness.
results_remediation_replace_key.sh — Proves: answer key replacement is applied; How: replace key, re‑score; Why: authoring updates must be defensible.
results_remediation_seed.sh — Proves: baseline remediation flow; How: create batch and apply; Why: sanity check for pipeline.
results_remediation_targeted_subset.sh — Proves: remediation can target subset; How: apply to subset and verify unaffected rows; Why: selective remediation is common.

Compliance Centre (`test_compliance_all.sh`)

compliance_jobs_list.sh — Proves: job enqueue + list; How: create job and list pending; Why: basic queue visibility and control.
compliance_idempotency.sh — Proves: compliance jobs are idempotent; How: repeat job trigger; Why: avoids duplicate exports/deletions.
compliance_dsar_export.sh — Proves: DSAR export completes and includes required sections; How: wait for ledger, download zip, validate data.json sections; Why: legal compliance and data portability.
compliance_dsar_sections.sh — Proves: DSAR output is scoped + complete; How: seed orgs + submissions, verify membership + submission scope; Why: prevents cross‑user or cross‑org leakage.
compliance_privacy_pack.sh — Proves: privacy pack zip contains DSAR and manifest; How: wait for ledger, verify pack + nested DSAR; Why: packaged compliance artifacts must be defensible.
dsar_minio_e2e.sh — Proves: DSAR storage integration with MinIO; How: run export and verify file in S3; Why: verifies storage layer connectivity.
compliance_forget.sh — Proves: PII is redacted, submissions retained; How: seed submission with text, run forget, verify user + answer scrub; Why: “forget” must be irreversible but non‑destructive for analytics.
compliance_restrict.sh — Proves: legal hold blocks destructive actions; How: place hold, verify forget blocked, lift and forget succeeds; Why: regulatory holds must be enforceable.
compliance_unlink.sh — Proves: org unlink removes role bindings and logs evidence; How: seed role, unlink, verify ledger + DB; Why: org‑scoped separation without user deletion.
compliance_ledger_list.sh — Proves: ledger listing returns events with sanitized metadata; How: insert ledger row and list; Why: audit evidence must be retrievable and safe.
compliance_retention_incidents.sh — Proves: retention incidents surface correctly; How: insert incident ledger row and list; Why: retention failures must be visible.

Reporting (`test_reporting_all.sh`)

reporting_idempotency.sh — Proves: reporting generation is idempotent; How: repeat export request; Why: prevents duplicate files.
reporting_submissions.sh — Proves: submissions reporting output shape and scope; How: seed submissions and query; Why: reporting base for multiple surfaces.
reporting_question_health.sh — Proves: question health metrics are correct; How: seed mixed attempts and verify health scores; Why: ensures data quality feedback.
reporting_question_health_export.sh — Proves: health report export works; How: trigger export and download; Why: offline analysis.
reporting_question_health_all.sh — Proves: all question health metrics; How: run comprehensive suite; Why: aggregate health visibility.
reporting_question_health_mcq.sh — Proves: MCQ health core + analysis panel; How: seed attempts with correct/incorrect mix; Why: health signal integrity.
reporting_question_health_mcq_export.sh — Proves: MCQ export bundle output; How: export and validate CSV preview; Why: customer‑facing exports.
reporting_question_health_mrq.sh — Proves: MRQ health core + choiceMulti analysis; How: seed MRQ attempts with partial/invalid/exempt; Why: first‑class non‑MCQ support.
reporting_question_health_mrq_export.sh — Proves: MRQ export bundle output; How: export and validate file set; Why: BI‑friendly multi‑file exports.
reporting_projection_range_mcq.sh — Proves: range projection logic for MCQ; How: seed range and verify window stats; Why: time‑window correctness.
reporting_projection_range_mrq.sh — Proves: range projection logic for MRQ; How: seed range and verify scored‑only metrics; Why: consistent denominators for new types.
reporting_evaluation_summary.sh — Proves: evaluation summary core blocks; How: seed completions + scores; Why: primary admin dashboard accuracy.
reporting_evaluation_summary_all_versions.sh — Proves: all‑versions rollup; How: seed multiple versions and aggregate; Why: cross‑version reporting.
reporting_evaluation_summary_groups.sh — Proves: group breakdowns; How: seed group assignments and compare; Why: cohort analytics baseline.
reporting_evaluation_summary_groups_export.sh — Proves: group export output; How: export and validate CSV; Why: downstream analytics.
reporting_evaluation_summary_cohorts.sh — Proves: cohort breakdown logic; How: seed cohort labels and compare; Why: segmentation reporting.
reporting_evaluation_summary_cohorts_export.sh — Proves: cohort export output; How: export and validate CSV; Why: analytics portability.
reporting_evaluation_summary_export.sh — Proves: summary export bundle; How: export and validate; Why: compliance and ops reporting.
reporting_evaluation_summary_scores.sh — Proves: score denominators + provisional logic; How: seed graded + ungraded; Why: truthfulness of KPIs.
reporting_evaluation_summary_time.sh — Proves: timing statistics + source; How: seed sessions with time; Why: timing UX and integrity.
reporting_evaluation_summary_filters.sh — Proves: runLabel/from/to filters; How: seed runs and filter; Why: scoped reporting correctness.
reporting_session_attempt.sh — Proves: session attempt report shape + aggregates; How: seed session, verify summary/integrity/sections/items; Why: defendable attempt reports.

Defensibility + Audit Primitives (`test_defensibility_all.sh`)

defensibility_authority_provenance.sh — Proves: system tracks authority of changes; How: create content and verify ledger metadata; Why: establishes chain of custody.
visibility_*_viewed.sh (evidence, subjects, assignment, programmes, engagements) — Proves: viewing sensitive data appends to ledger; How: access entity and verify "viewed" event; Why: regulatory compliance and anti-leakage audit.
defensibility_retention_defaults.sh — Proves: default retention policies are applied; How: check retention date on new entities; Why: ensures data lifecycle compliance.
defensibility_enforcement_modes.sh — Proves: system respects enforcement strictness; How: toggle enforcement and verify blocked actions; Why: allows flexible security postures.
evidence_integrity_check.sh — Proves: evidence file hashes match ledger; How: re-hash file and compare; Why: prevents tampering with evidence.
submission_hashing.sh — Proves: submissions are hashed on completion; How: submit and check state.hashed; Why: ensures submission integrity.
submission_ratification.sh — Proves: submission ratification works; How: ratify and check ledger; Why: finalises result for legal purposes.
submission_void.sh — Proves: submissions can be voided; How: void submission and check status; Why: handling of invalid/cheated attempts.
evidence_void.sh — Proves: evidence can be voided; How: void evidence and check ledger; Why: removal of incorrect evidence while keeping audit trail.
claims_* & disputes_* — Proves: claims/disputes workflow; How: create claim, escalate to dispute, ratify; Why: platform defensibility in contested results.
evaluation_publish_stepup_context.sh — Proves: publishing evaluations requires step-up; How: attempt publish without MFA; Why: prevents unauthorised content changes.
proof_readiness_dual_time_smoke.sh — Proves: dual-time (system vs ledger) consistency; How: check event timestamps; Why: prevents time-tampering.
defensibility_exceptions_smoke.sh — Proves: exception logging works; How: trigger failure and check defensibility log; Why: audit trail for system errors.

Glass Box (`test_glassbox_all.sh`)

glassbox_*.sh (engagement, assignment, submission, programme) — Proves: entity history is fully ledger-backed; How: query /glassbox for each entity; Why: provides "single source of truth" for client audits.

Observations (`test_observations_all.sh`)

subjects_crud.sh — Proves: observation subject lifecycle; How: create/list/delete subjects; Why: base for observational assessments.
observation_subjects_import.sh — Proves: bulk import of subjects; How: upload CSV; Why: efficiency for large cohorts.
observation_subjects_filtering.sh — Proves: subject filtering and search; How: query subjects by various attributes; Why: UX performance.
submission_subjects.sh — Proves: linking submissions to subjects; How: verify subject-submission relationship; Why: core for subject-centric reporting.
observation_assignment_subject_auto.sh — Proves: auto-assignment of subjects; How: verify subjects assigned based on rules; Why: reduces manual overhead.
observation_subject_visibility_assignment.sh — Proves: subjects only visible to assigned observers; How: check visibility from different accounts; Why: privacy and RLS integrity.
observation_findings_* — Proves: findings validation and requirements; How: attempt to save finding with missing comment/evidence; Why: ensures data quality in observations.
observation_verification_context_required.sh — Proves: observations require context; How: check verification rules; Why: auditability.
observation_submission_* (approval, four-eyes, auto-approve) — Proves: observation submission workflows; How: verify multi-stage approval; Why: quality control in high-stakes assessments.
observation_batch_assignments.sh — Proves: batch processing of assignments; How: assign multiple subjects at once; Why: operational efficiency.

Localisation (`test_localisation_all.sh`)

localisation_projection.sh — Proves: content is correctly projected into requested locales; How: create content in multiple languages and fetch with Accept-Language; Why: ensures internationalisation works at the API level.

Why this matters

These smokes are the “fast truth” layer: they prove the minimum viable correctness for security, authoring, delivery, remediation, reporting, and compliance without requiring full integration environments. They also encode the platform invariants (RLS scope, deterministic snapshots, auditability, and privacy‑safe evidence), so regressions show up quickly and in a way that is easy to diagnose.

How to run​

Admin + Security (test_admin_all.sh)​

authoring (test_authoring_all.sh)​

Engagements (test_engagements_all.sh)​

Delivery + Submissions (test_delivery_all.sh)​

Remediation (test_remediation_all.sh)​

Compliance Centre (test_compliance_all.sh)​

Reporting (test_reporting_all.sh)​

Defensibility + Audit Primitives (test_defensibility_all.sh)​

Glass Box (test_glassbox_all.sh)​

Observations (test_observations_all.sh)​

Localisation (test_localisation_all.sh)​

Why this matters​