Skip to main content

ADR-0011: Compliance Centre Hybrid Scrub + Compliance Ledger

Status

Accepted (Implemented)

Context

Evalium is designed for high-stakes assessment use cases where outcomes and evidence may be relied upon for safety, competence, employment decisions, and auditability.

Evalium’s architecture enforces:

  • Multi-tenant + multi-org silo isolation via PostgreSQL RLS and TxManager scope injection.
  • Immutable versioning for authoring and immutable submissions with frozen version_snapshot for reporting integrity.
  • Append-only / compensating ledger patterns for “high-stakes edits” (e.g., remediation apply/revert) instead of mutating historic records.
  • Capability-based authorization (no hard-coded role checks).

This creates a direct conflict with a naïve interpretation of “Right to Erasure” (“Delete the user and all their data”):

  • In high-stakes assessment, deleting submissions can destroy an organisation’s proof of competence/incompetence and can break long-term reporting integrity.
  • Evalium’s KOE model introduces “toxic data” risk:
    • Knowledge (structured answers/scores) can often be retained once identity is removed.
    • Observation/Evidence may contain unstructured PII (faces, names, voices) that cannot be safely anonymised at scale.

We need a Compliance Centre that supports DSAR export and GDPR-aligned “forgetting” while preserving immutable integrity guarantees and preventing cross-org administrative harm.

Decision

1) “Forget User” uses a Hybrid Scrub strategy, not a hard-delete of the user entity

Evalium MUST NOT provide a general “hard delete user” operation that removes the user entity and all dependent records.

Instead, the Compliance Centre MUST implement Forget User (Hybrid Scrub), which is data-type aware:

  1. Identity data (Users table + identifiers): REDACT

    • Overwrite direct identifiers (name, email, external identifiers) with redacted placeholders (e.g., redacted_[uuid]).
    • Disable/rotate identifiers such that the account cannot be used for login again unless explicitly re-provisioned.
  2. Structured assessment data (submissions, answers, scores, outcomes, timestamps, structured logs): PSEUDONYMIZE & RETAIN

    • Retain submission rows and scoring/outcome history for auditability and stable analytics.
    • Retain linkage to the same internal user_id whose identity fields are now redacted.
    • Do not rewrite immutable snapshots; do not mutate historic scoring records beyond existing ledger mechanics.
  3. Unstructured data (KOE evidence blobs, free-text observations, notes): HARD DELETE / REDACT

    • Evidence blobs (e.g., S3 media) MUST be physically deleted.
    • Free-text fields that may contain PII MUST be overwritten with [Redacted] (or removed if schema permits).
    • Any derived artefacts that embed unstructured PII MUST be deleted or regenerated without PII.

This approach preserves Evalium’s immutable/audit-proof integrity guarantees while meeting privacy goals for the highest-risk PII classes.

1.1) PII Boundary and Denormalisation Rule (applies to all features)

To keep anonymisation effective and avoid “PII copies” spreading across the system:

  • Direct identifiers MUST live only in the users domain (and in explicitly classified unstructured evidence stores where unavoidable).
    • Direct identifiers include: name, email, external identifiers, and any other values intended to identify a person.
  • All other domains (e.g., reporting read-models, skills/competency facts, programme progress, monitoring summaries) MUST store only user_id references and resolve display identity just-in-time via the Users read path (DB join / lookup).
  • Derived/projection tables MUST NOT copy users.email, users.name, or users.metadata into their own storage.
  • Exceptions are allowed only with an explicit documented decision (new ADR or feature-spec section) defining:
    • exactly what data is duplicated,
    • how it will be scrubbed/anonymised/reprojected on Forget User,
    • how DSAR export accounts for it.

2) Scope rules: Tenant-wide “Forget User” requires tenant-wide authority

Because Evalium supports multi-org silos, Compliance actions must respect scope boundaries:

  • Tenant Owner / Global Admin (tenant-wide scope) MAY initiate “Forget User (Hybrid Scrub)” for a subject user.
  • Org Admin MUST NOT be able to initiate tenant-wide Forget User.
  • Org Admin MAY only perform Unlink From Org actions:
    • revoke org-specific role bindings / access
    • remove the user from org-specific group memberships
    • leave the user entity and other org data intact

This prevents an Org Admin from affecting records that belong to other org units.

3) Compliance operations are asynchronous and durable

DSAR export and Forget User may touch many tables and artefacts and MUST be processed asynchronously.

  • The API MUST create a privacy_jobs record describing the request (type, subject, actor, scope, legal basis text, status).
  • A background worker MUST pick up the job and execute it.
  • The job execution MUST be idempotent and safe to retry.
  • The implementation SHOULD use a Transactional Outbox pattern for durability and to avoid “job created but not processed” failure modes.

Operational audit logs are necessary but not sufficient for compliance proof.

Evalium MUST introduce a dedicated compliance_ledger table to record completed compliance actions:

  • job_id, tenant_id, scope (tenant/org), actor_user_id, subject_user_id
  • action_type (EXPORT / FORGET / RESTRICT / UNLINK)
  • legal_basis_text (user-provided rationale)
  • artifact_hash (hash of generated export/receipt artefacts)
  • created_at

The Compliance Ledger MUST be the canonical source for:

  • “What compliance action was performed?”
  • “By whom, when, for whom, under what stated basis?”
  • “Which artefacts were produced, and can we prove they weren’t altered?”

5) Compliance Receipt artefacts are first-class outputs

Upon completion of Forget User (and optionally Export), the system MUST generate a “Compliance Receipt” artefact (PDF or HTML) that summarises:

  • date/time completed
  • actor
  • subject reference (redacted if applicable)
  • scope (tenant/org)
  • a domain-level summary of what was redacted, pseudonymized/retained, and deleted
  • artefact hashes + job id

This receipt is intended to reduce customer anxiety and support audits without involving Evalium support.

6) Derived Read-Models / Projections (Reporting, Skills, Search Indexes)

Evalium will introduce derived storage (read-models / projections) for performance and UX (e.g., reporting flats, question health rollups, skills evidence facts).

These stores MUST comply with the Hybrid Scrub model:

  • No direct identifiers may be stored in derived tables (see Denormalisation Rule).
  • Derived tables MAY store user_id references only.
  • Derived tables that contain free-text fields or unstructured blobs/artefacts MUST be treated as unstructured data:
    • delete artefacts, and/or scrub fields to [Redacted] as required.
  • Aggregate rollups should avoid identifiers entirely wherever feasible.
  • If a derived store contains any additional subject-linked fields beyond user_id (e.g., demographic snapshots, external IDs, cached display names), it MUST be treated as “evidence-like” and MUST either:
    • be scrubbed/anonymised in the Forget User workflow, or
    • be deleted and reprojected without the subject’s data.

If legal hold blocks deletion/scrub, derived store operations MUST also be blocked and recorded as such in receipts/ledger entries.

Consequences

Benefits

  • Preserves immutable, audit-proof reporting while handling privacy requests safely.
  • Minimises unstructured PII risk by deleting/redacting the most dangerous data categories.
  • Prevents cross-org leakage and destructive actions by restricting Forget User to tenant-wide authority.
  • Creates strong, exportable evidence for customers (Compliance Receipt + Compliance Ledger).
  • Establishes a clear PII boundary that reduces future compliance drift as new features (reporting/skills) add projections.
  • Fits Evalium’s existing discipline: scoped TxManager transactions, RLS enforcement, and background worker patterns.

Trade-offs / Costs

  • Requires additional schema: privacy_jobs, compliance_ledger, and (if used) outbox tables.
  • Requires a worker loop and operational monitoring for job processing.
  • Requires careful domain classification so fields are correctly treated as structured vs unstructured PII-bearing data.
  • The “no denormalised identifiers” rule may increase reliance on joins/lookups at read-time; massive exports should be asynchronous if needed.

Implementation Notes (Non-Normative)

  • Add a new privacy service module responsible for:

    • job creation/validation
    • job execution orchestration
    • artefact generation (export zip, receipt)
    • writing compliance ledger entries
  • Job execution MUST follow TxManager + RLS scope semantics and MUST NOT bypass RLS by using direct DB queries.

  • Multi-table flows MUST follow canonical lock ordering rules and may require new lock helpers if jobs touch multiple clusters.

  • “Export” should default to excluding raw evidence blobs unless explicitly enabled, to prevent accidental third-party disclosure in DSAR workflows.

  • The Compliance Centre UI should use progressive disclosure:

    • “Unlink from org” for Org Admins
    • “Forget user (Hybrid Scrub)” for tenant-wide admins only, with explicit warnings and a generated receipt.