Evalium: Database Locking & Concurrency Policy v1.0
Status: Implemented and enforced via CI and lock helpers. Any deviation is a bug.
1. Guiding Principles
This document defines the official policy for preventing database deadlocks and managing concurrency.
-
Principle #1: Avoid Locking. The best lock is no lock. Prefer optimistic concurrency control (
updated_atorlock_versionchecks) and append-only data models. Multi-table row-level locking is a tool of last resort for a few, truly complex transactions. -
Principle #2: Enforce a Canonical Order. For the rare cases where a transaction must lock rows from multiple tables, it MUST do so in the canonical order defined in this document (alphabetical within a single cluster).
-
Principle #3: Isolate Lock Logic. All multi-table locking logic MUST be encapsulated in dedicated "lock helper" functions within the
internal/db/lockspackage. Business logic must call these helpers and is forbidden from implementingFOR UPDATEqueries directly. -
Principle #4: This is a Living Document. Any pull request that introduces a new
SELECT ... FOR UPDATEquery or changes locking behavior MUST update this document to reflect the new reality.
2. Canonical Lock Order
Cross-Cluster Locking Rule
Acquiring row-level locks across different clusters in a single transaction is strictly FORBIDDEN.
If an operation needs to effect change across clusters, it must be refactored into multiple, separate transactions. Communication between these transactions must be handled explicitly through application logic (e.g., an event-driven or outbox pattern).
Lock Order Within Clusters
Within a single cluster, locks MUST be acquired in strict alphabetical order of the table names.
Cluster A: Delivery & Results
Scope: The candidate delivery and scoring lifecycle. This is the highest-contention cluster. Tables (Alphabetical Order):
assignmentsassignment_overridesassignment_schedulesdelivery_sessionsdelivery_session_eventsdelivery_session_section_statesresult_correction_batchesresult_correctionssubmission_itemssubmission_score_versionssubmissions
Cluster B1: authoring
Scope: Content authoring. Tables (Alphabetical Order):
evaluation_bucketsevaluation_glossary_overridesevaluation_sectionsevaluation_versionsevaluationsglossary_term_aliasesglossary_termspassagesquestion_versionsquestions
Cluster B2: Programmes & Groups
Scope: Programme management and user groups. Tables (Alphabetical Order):
group_membersgroupsprogram_enrolmentsprogram_progressprogramme_assignment_outboxprogramme_requirementsprogrammes
Cluster D: Core Identity & Auditing
Scope: Foundational, system-level entities with restrictive locking rules. Admin flows may only lock Cluster D tables alone, never in combination with Cluster A, B1, or B2 tables. Tables & Rules:
audit_logs- [NEVER LOCK] (Append-only).tenants- [NEVER LOCK IN APP] (No row locks in application code).users- [ADMIN FLOWS ONLY] and never with any table outside Cluster D.org_units- [ADMIN FLOWS ONLY] and never with any table outside Cluster D.roles- [ADMIN FLOWS ONLY] and never with any table outside Cluster D.
3. Transaction Types and Cluster Mapping
This section documents high-value write operations and maps them to a single cluster.
| Operation | Service Function(s) | Cluster | Locked Tables (in order) | Cross-Cluster Status |
|---|---|---|---|---|
| CreateAssignment | assignments.Service.CreateAssignment | A | assignments, assignment_schedules | OK |
| StartDeliverySession | results.Service.CreateDeliverySession | A | assignments, submissions (via helper), delivery_sessions | OK |
| ApplyRemediationBatch | remediation.Service.ApplyCorrectionBatch | A | result_correction_batches, result_corrections, submission_score_versions, submissions (via LockRemediationBatchEntities + LockSubmissionForRemediation) | OK |
| ProgrammeEnrolment | programme_enrolment.Service.Create | B2 | program_enrolments, program_progress, programme_assignment_outbox | Refactored |
| SessionReaper | results.Service.CloseExpiredSessions | A | delivery_sessions, submissions (via LockExpiredSessionsAndSubmissions) | OK |
| UpdateProgrammeProgress | programme_progress.Listener.OnSubmissionCreatedTx | B2 | program_enrolments, program_progress | OK |
| SetUserRole | users.Service.SetUserRole | D | roles, users | OK |
| RollbackRemediation | TBD | TBD | TBD | TBD |
| BulkUserActions | TBD | TBD | TBD | TBD |
Refactor Pattern: ProgrammeEnrolment
The ProgrammeEnrolment operation is refactored into a two-phase pattern to prevent cross-cluster transactions.
Phase 1 (Synchronous, Cluster B2 transaction):
- The
programme_enrolment.Service.Createfunction creates theprogram_enrolmentsand initialprogram_progressrows. - Within the same transaction, it appends an outbox event to the
programme_assignment_outboxtable with apendingstatus. - It does not call any services that touch Cluster A tables.
Phase 2 (Asynchronous Worker, Cluster A transaction):
- A new
ProgrammeAssignmentsWorkerruns periodically. - It fetches pending events from
programme_assignment_outbox. - For each event, it starts a new Cluster A transaction via TxManager.
- Within that transaction, it calls
programme_orchestrator.CreateAssignmentsForEnrolmentto create theassignments, using existing idempotence logic. - It marks the outbox event as
completedorfailed.
Refactor Pattern: PrivacyJobsWorker
Privacy jobs that touch Cluster D (users, privacy holds, compliance ledger/outbox) and Cluster A (submission items or telemetry events) are split into separate transactions.
Phase 1 (Cluster D, precheck):
- Validate holds and ensure the subject is eligible (e.g., no active hold for
forget_user/retention actions). - If blocked, emit the compliance outbox event and mark the job
completed(no Cluster A work).
Phase 2 (Cluster A, evidence/telemetry):
- Scrub
submission_itemsforevidence_delete/forget_useror deletedelivery_session_eventsfortelemetry_delete.
Phase 3 (Cluster D, finalize):
- Write receipts/outbox events and mark the job
completed.
No single transaction locks Cluster D and Cluster A tables together.
4. Enforcement
- Centralized Lock Helpers: All
SELECT ... FOR UPDATEqueries MUST reside in theinternal/db/lockspackage. - One Helper Per Transaction: Within a database transaction, business logic may call at most one lock helper function. Composition of multiple lock helpers is forbidden.
- CI Check: The CI pipeline runs
scripts/check_for_update.sh, which fails the build ifFOR UPDATEis found outside theinternal/db/lockspackage. - SQL Semantics: All
FOR UPDATEqueries must lock rows from exactly one physical table. Joins that lock multiple tables in a single statement are forbidden. - Reporting Read Models: Reporting tables store denormalized IDs only and intentionally omit FKs to OLTP tables to avoid cross-cluster locks.
5. Concurrency Test Coverage
This policy is validated by dedicated concurrency tests for critical flows.
remediation_service_concurrency_test.go(TestApplyCorrectionBatch_Concurrent_NoDeadlock) — coversApplyCorrectionBatch.results_service_reaper_concurrency_test.go(TestCloseExpiredSessions_Concurrent_NoDeadlock) — coversCloseExpiredSessions/LockExpiredSessionsAndSubmissions.results_service_concurrency_test.go(TestCreateDeliverySession_Concurrent_NoDeadlock) — covers concurrent session creation on the same assignment.programme_assignments_worker_concurrency_test.go(TestProgrammeAssignmentsWorker_Concurrent_NoDuplicateAssignments) — covers the outbox→assignments worker under concurrent runs.
Race checks: go test -race is executed in CI on an x86_64 runner (arm64 local hosts may not have a working race runtime). At minimum, we run go test -race ./internal/services/... in CI; failures block merges.