Skip to main content

Evalium: Database Locking & Concurrency Policy v1.0

Status: Implemented and enforced via CI and lock helpers. Any deviation is a bug.

1. Guiding Principles

This document defines the official policy for preventing database deadlocks and managing concurrency.

  1. Principle #1: Avoid Locking. The best lock is no lock. Prefer optimistic concurrency control (updated_at or lock_version checks) and append-only data models. Multi-table row-level locking is a tool of last resort for a few, truly complex transactions.

  2. Principle #2: Enforce a Canonical Order. For the rare cases where a transaction must lock rows from multiple tables, it MUST do so in the canonical order defined in this document (alphabetical within a single cluster).

  3. Principle #3: Isolate Lock Logic. All multi-table locking logic MUST be encapsulated in dedicated "lock helper" functions within the internal/db/locks package. Business logic must call these helpers and is forbidden from implementing FOR UPDATE queries directly.

  4. Principle #4: This is a Living Document. Any pull request that introduces a new SELECT ... FOR UPDATE query or changes locking behavior MUST update this document to reflect the new reality.

2. Canonical Lock Order

Cross-Cluster Locking Rule

Acquiring row-level locks across different clusters in a single transaction is strictly FORBIDDEN.

If an operation needs to effect change across clusters, it must be refactored into multiple, separate transactions. Communication between these transactions must be handled explicitly through application logic (e.g., an event-driven or outbox pattern).

Lock Order Within Clusters

Within a single cluster, locks MUST be acquired in strict alphabetical order of the table names.


Cluster A: Delivery & Results

Scope: The candidate delivery and scoring lifecycle. This is the highest-contention cluster. Tables (Alphabetical Order):

  1. assignments
  2. assignment_overrides
  3. assignment_schedules
  4. delivery_sessions
  5. delivery_session_events
  6. delivery_session_section_states
  7. result_correction_batches
  8. result_corrections
  9. submission_items
  10. submission_score_versions
  11. submissions

Cluster B1: authoring

Scope: Content authoring. Tables (Alphabetical Order):

  1. evaluation_buckets
  2. evaluation_glossary_overrides
  3. evaluation_sections
  4. evaluation_versions
  5. evaluations
  6. glossary_term_aliases
  7. glossary_terms
  8. passages
  9. question_versions
  10. questions

Cluster B2: Programmes & Groups

Scope: Programme management and user groups. Tables (Alphabetical Order):

  1. group_members
  2. groups
  3. program_enrolments
  4. program_progress
  5. programme_assignment_outbox
  6. programme_requirements
  7. programmes

Cluster D: Core Identity & Auditing

Scope: Foundational, system-level entities with restrictive locking rules. Admin flows may only lock Cluster D tables alone, never in combination with Cluster A, B1, or B2 tables. Tables & Rules:

  1. audit_logs - [NEVER LOCK] (Append-only).
  2. tenants - [NEVER LOCK IN APP] (No row locks in application code).
  3. users - [ADMIN FLOWS ONLY] and never with any table outside Cluster D.
  4. org_units - [ADMIN FLOWS ONLY] and never with any table outside Cluster D.
  5. roles - [ADMIN FLOWS ONLY] and never with any table outside Cluster D.

3. Transaction Types and Cluster Mapping

This section documents high-value write operations and maps them to a single cluster.

OperationService Function(s)ClusterLocked Tables (in order)Cross-Cluster Status
CreateAssignmentassignments.Service.CreateAssignmentAassignments, assignment_schedulesOK
StartDeliverySessionresults.Service.CreateDeliverySessionAassignments, submissions (via helper), delivery_sessionsOK
ApplyRemediationBatchremediation.Service.ApplyCorrectionBatchAresult_correction_batches, result_corrections, submission_score_versions, submissions (via LockRemediationBatchEntities + LockSubmissionForRemediation)OK
ProgrammeEnrolmentprogramme_enrolment.Service.CreateB2program_enrolments, program_progress, programme_assignment_outboxRefactored
SessionReaperresults.Service.CloseExpiredSessionsAdelivery_sessions, submissions (via LockExpiredSessionsAndSubmissions)OK
UpdateProgrammeProgressprogramme_progress.Listener.OnSubmissionCreatedTxB2program_enrolments, program_progressOK
SetUserRoleusers.Service.SetUserRoleDroles, usersOK
RollbackRemediationTBDTBDTBDTBD
BulkUserActionsTBDTBDTBDTBD

Refactor Pattern: ProgrammeEnrolment

The ProgrammeEnrolment operation is refactored into a two-phase pattern to prevent cross-cluster transactions.

Phase 1 (Synchronous, Cluster B2 transaction):

  • The programme_enrolment.Service.Create function creates the program_enrolments and initial program_progress rows.
  • Within the same transaction, it appends an outbox event to the programme_assignment_outbox table with a pending status.
  • It does not call any services that touch Cluster A tables.

Phase 2 (Asynchronous Worker, Cluster A transaction):

  • A new ProgrammeAssignmentsWorker runs periodically.
  • It fetches pending events from programme_assignment_outbox.
  • For each event, it starts a new Cluster A transaction via TxManager.
  • Within that transaction, it calls programme_orchestrator.CreateAssignmentsForEnrolment to create the assignments, using existing idempotence logic.
  • It marks the outbox event as completed or failed.

Refactor Pattern: PrivacyJobsWorker

Privacy jobs that touch Cluster D (users, privacy holds, compliance ledger/outbox) and Cluster A (submission items or telemetry events) are split into separate transactions.

Phase 1 (Cluster D, precheck):

  • Validate holds and ensure the subject is eligible (e.g., no active hold for forget_user/retention actions).
  • If blocked, emit the compliance outbox event and mark the job completed (no Cluster A work).

Phase 2 (Cluster A, evidence/telemetry):

  • Scrub submission_items for evidence_delete/forget_user or delete delivery_session_events for telemetry_delete.

Phase 3 (Cluster D, finalize):

  • Write receipts/outbox events and mark the job completed.

No single transaction locks Cluster D and Cluster A tables together.

4. Enforcement

  1. Centralized Lock Helpers: All SELECT ... FOR UPDATE queries MUST reside in the internal/db/locks package.
  2. One Helper Per Transaction: Within a database transaction, business logic may call at most one lock helper function. Composition of multiple lock helpers is forbidden.
  3. CI Check: The CI pipeline runs scripts/check_for_update.sh, which fails the build if FOR UPDATE is found outside the internal/db/locks package.
  4. SQL Semantics: All FOR UPDATE queries must lock rows from exactly one physical table. Joins that lock multiple tables in a single statement are forbidden.
  5. Reporting Read Models: Reporting tables store denormalized IDs only and intentionally omit FKs to OLTP tables to avoid cross-cluster locks.

5. Concurrency Test Coverage

This policy is validated by dedicated concurrency tests for critical flows.

  • remediation_service_concurrency_test.go (TestApplyCorrectionBatch_Concurrent_NoDeadlock) — covers ApplyCorrectionBatch.
  • results_service_reaper_concurrency_test.go (TestCloseExpiredSessions_Concurrent_NoDeadlock) — covers CloseExpiredSessions/LockExpiredSessionsAndSubmissions.
  • results_service_concurrency_test.go (TestCreateDeliverySession_Concurrent_NoDeadlock) — covers concurrent session creation on the same assignment.
  • programme_assignments_worker_concurrency_test.go (TestProgrammeAssignmentsWorker_Concurrent_NoDuplicateAssignments) — covers the outbox→assignments worker under concurrent runs.

Race checks: go test -race is executed in CI on an x86_64 runner (arm64 local hosts may not have a working race runtime). At minimum, we run go test -race ./internal/services/... in CI; failures block merges.