Evalium: Database Locking & Concurrency Policy v1.0

Status: Implemented and enforced via CI and lock helpers. Any deviation is a bug.

1. Guiding Principles

This document defines the official policy for preventing database deadlocks and managing concurrency.

Principle #1: Avoid Locking. The best lock is no lock. Prefer optimistic concurrency control (updated_at or lock_version checks) and append-only data models. Multi-table row-level locking is a tool of last resort for a few, truly complex transactions.
Principle #2: Enforce a Canonical Order. For the rare cases where a transaction must lock rows from multiple tables, it MUST do so in the canonical order defined in this document (alphabetical within a single cluster).
Principle #3: Isolate Lock Logic. All multi-table locking logic MUST be encapsulated in dedicated "lock helper" functions within the internal/db/locks package. Business logic must call these helpers and is forbidden from implementing FOR UPDATE queries directly.
Principle #4: This is a Living Document. Any pull request that introduces a new SELECT ... FOR UPDATE query or changes locking behavior MUST update this document to reflect the new reality.

2. Canonical Lock Order

Cross-Cluster Locking Rule

Acquiring row-level locks across different clusters in a single transaction is strictly FORBIDDEN.

If an operation needs to effect change across clusters, it must be refactored into multiple, separate transactions. Communication between these transactions must be handled explicitly through application logic (e.g., an event-driven or outbox pattern).

Lock Order Within Clusters

Within a single cluster, locks MUST be acquired in strict alphabetical order of the table names.

Cluster A: Delivery & Results

Scope: The candidate delivery and scoring lifecycle. This is the highest-contention cluster. Tables (Alphabetical Order):

assignments
assignment_overrides
assignment_schedules
delivery_sessions
delivery_session_events
delivery_session_section_states
result_correction_batches
result_corrections
submission_items
submission_score_versions
submissions

Cluster B1: authoring

Scope: Content authoring. Tables (Alphabetical Order):

evaluation_buckets
evaluation_glossary_overrides
evaluation_sections
evaluation_versions
evaluations
glossary_term_aliases
glossary_terms
passages
question_versions
questions

Cluster B2: Programmes & Groups

Scope: Programme management and user groups. Tables (Alphabetical Order):

group_members
groups
program_enrolments
program_progress
programme_assignment_outbox
programme_requirements
programmes

Cluster D: Core Identity & Auditing

Scope: Foundational, system-level entities with restrictive locking rules. Admin flows may only lock Cluster D tables alone, never in combination with Cluster A, B1, or B2 tables. Tables & Rules:

audit_logs - [NEVER LOCK] (Append-only).
tenants - [NEVER LOCK IN APP] (No row locks in application code).
users - [ADMIN FLOWS ONLY] and never with any table outside Cluster D.
org_units - [ADMIN FLOWS ONLY] and never with any table outside Cluster D.
roles - [ADMIN FLOWS ONLY] and never with any table outside Cluster D.

3. Transaction Types and Cluster Mapping

This section documents high-value write operations and maps them to a single cluster.

Operation	Service Function(s)	Cluster	Locked Tables (in order)	Cross-Cluster Status
CreateAssignment	`assignments.Service.CreateAssignment`	A	`assignments`, `assignment_schedules`	OK
StartDeliverySession	`results.Service.CreateDeliverySession`	A	`assignments`, `submissions` (via helper), `delivery_sessions`	OK
ApplyRemediationBatch	`remediation.Service.ApplyCorrectionBatch`	A	`result_correction_batches`, `result_corrections`, `submission_score_versions`, `submissions` (via `LockRemediationBatchEntities` + `LockSubmissionForRemediation`)	OK
ProgrammeEnrolment	`programme_enrolment.Service.Create`	B2	`program_enrolments`, `program_progress`, `programme_assignment_outbox`	Refactored
SessionReaper	`results.Service.CloseExpiredSessions`	A	`delivery_sessions`, `submissions` (via `LockExpiredSessionsAndSubmissions`)	OK
UpdateProgrammeProgress	`programme_progress.Listener.OnSubmissionCreatedTx`	B2	`program_enrolments`, `program_progress`	OK
SetUserRole	`users.Service.SetUserRole`	D	`roles`, `users`	OK
RollbackRemediation	`TBD`	`TBD`	`TBD`	`TBD`
BulkUserActions	`TBD`	`TBD`	`TBD`	`TBD`

Refactor Pattern: `ProgrammeEnrolment`

The ProgrammeEnrolment operation is refactored into a two-phase pattern to prevent cross-cluster transactions.

Phase 1 (Synchronous, Cluster B2 transaction):

The programme_enrolment.Service.Create function creates the program_enrolments and initial program_progress rows.
Within the same transaction, it appends an outbox event to the programme_assignment_outbox table with a pending status.
It does not call any services that touch Cluster A tables.

Phase 2 (Asynchronous Worker, Cluster A transaction):

A new ProgrammeAssignmentsWorker runs periodically.
It fetches pending events from programme_assignment_outbox.
For each event, it starts a new Cluster A transaction via TxManager.
Within that transaction, it calls programme_orchestrator.CreateAssignmentsForEnrolment to create the assignments, using existing idempotence logic.
It marks the outbox event as completed or failed.

Refactor Pattern: `PrivacyJobsWorker`

Privacy jobs that touch Cluster D (users, privacy holds, compliance ledger/outbox) and Cluster A (submission items or telemetry events) are split into separate transactions.

Phase 1 (Cluster D, precheck):

Validate holds and ensure the subject is eligible (e.g., no active hold for forget_user/retention actions).
If blocked, emit the compliance outbox event and mark the job completed (no Cluster A work).

Phase 2 (Cluster A, evidence/telemetry):

Scrub submission_items for evidence_delete/forget_user or delete delivery_session_events for telemetry_delete.

Phase 3 (Cluster D, finalize):

Write receipts/outbox events and mark the job completed.

No single transaction locks Cluster D and Cluster A tables together.

4. Enforcement

Centralized Lock Helpers: All SELECT ... FOR UPDATE queries MUST reside in the internal/db/locks package.
One Helper Per Transaction: Within a database transaction, business logic may call at most one lock helper function. Composition of multiple lock helpers is forbidden.
CI Check: The CI pipeline runs scripts/check_for_update.sh, which fails the build if FOR UPDATE is found outside the internal/db/locks package.
SQL Semantics: All FOR UPDATE queries must lock rows from exactly one physical table. Joins that lock multiple tables in a single statement are forbidden.
Reporting Read Models: Reporting tables store denormalized IDs only and intentionally omit FKs to OLTP tables to avoid cross-cluster locks.

5. Concurrency Test Coverage

This policy is validated by dedicated concurrency tests for critical flows.

remediation_service_concurrency_test.go (TestApplyCorrectionBatch_Concurrent_NoDeadlock) — covers ApplyCorrectionBatch.
results_service_reaper_concurrency_test.go (TestCloseExpiredSessions_Concurrent_NoDeadlock) — covers CloseExpiredSessions/LockExpiredSessionsAndSubmissions.
results_service_concurrency_test.go (TestCreateDeliverySession_Concurrent_NoDeadlock) — covers concurrent session creation on the same assignment.
programme_assignments_worker_concurrency_test.go (TestProgrammeAssignmentsWorker_Concurrent_NoDuplicateAssignments) — covers the outbox→assignments worker under concurrent runs.

Race checks: go test -race is executed in CI on an x86_64 runner (arm64 local hosts may not have a working race runtime). At minimum, we run go test -race ./internal/services/... in CI; failures block merges.

1. Guiding Principles​

2. Canonical Lock Order​

Cross-Cluster Locking Rule​

Lock Order Within Clusters​

Cluster A: Delivery & Results​

Cluster B1: authoring​

Cluster B2: Programmes & Groups​

Cluster D: Core Identity & Auditing​

3. Transaction Types and Cluster Mapping​

Refactor Pattern: ProgrammeEnrolment​

Refactor Pattern: PrivacyJobsWorker​

4. Enforcement​

5. Concurrency Test Coverage​