ADR-0011: Compliance Centre Hybrid Scrub + Compliance Ledger
Status
Accepted (Implemented)
Context
Evalium is designed for high-stakes assessment use cases where outcomes and evidence may be relied upon for safety, competence, employment decisions, and auditability.
Evalium’s architecture enforces:
- Multi-tenant + multi-org silo isolation via PostgreSQL RLS and TxManager scope injection.
- Immutable versioning for authoring and immutable submissions with frozen
version_snapshotfor reporting integrity. - Append-only / compensating ledger patterns for “high-stakes edits” (e.g., remediation apply/revert) instead of mutating historic records.
- Capability-based authorization (no hard-coded role checks).
This creates a direct conflict with a naïve interpretation of “Right to Erasure” (“Delete the user and all their data”):
- In high-stakes assessment, deleting submissions can destroy an organisation’s proof of competence/incompetence and can break long-term reporting integrity.
- Evalium’s KOE model introduces “toxic data” risk:
- Knowledge (structured answers/scores) can often be retained once identity is removed.
- Observation/Evidence may contain unstructured PII (faces, names, voices) that cannot be safely anonymised at scale.
We need a Compliance Centre that supports DSAR export and GDPR-aligned “forgetting” while preserving immutable integrity guarantees and preventing cross-org administrative harm.
Decision
1) “Forget User” uses a Hybrid Scrub strategy, not a hard-delete of the user entity
Evalium MUST NOT provide a general “hard delete user” operation that removes the user entity and all dependent records.
Instead, the Compliance Centre MUST implement Forget User (Hybrid Scrub), which is data-type aware:
-
Identity data (Users table + identifiers): REDACT
- Overwrite direct identifiers (name, email, external identifiers) with redacted placeholders (e.g.,
redacted_[uuid]). - Disable/rotate identifiers such that the account cannot be used for login again unless explicitly re-provisioned.
- Overwrite direct identifiers (name, email, external identifiers) with redacted placeholders (e.g.,
-
Structured assessment data (submissions, answers, scores, outcomes, timestamps, structured logs): PSEUDONYMIZE & RETAIN
- Retain submission rows and scoring/outcome history for auditability and stable analytics.
- Retain linkage to the same internal
user_idwhose identity fields are now redacted. - Do not rewrite immutable snapshots; do not mutate historic scoring records beyond existing ledger mechanics.
-
Unstructured data (KOE evidence blobs, free-text observations, notes): HARD DELETE / REDACT
- Evidence blobs (e.g., S3 media) MUST be physically deleted.
- Free-text fields that may contain PII MUST be overwritten with
[Redacted](or removed if schema permits). - Any derived artefacts that embed unstructured PII MUST be deleted or regenerated without PII.
This approach preserves Evalium’s immutable/audit-proof integrity guarantees while meeting privacy goals for the highest-risk PII classes.
1.1) PII Boundary and Denormalisation Rule (applies to all features)
To keep anonymisation effective and avoid “PII copies” spreading across the system:
- Direct identifiers MUST live only in the
usersdomain (and in explicitly classified unstructured evidence stores where unavoidable).- Direct identifiers include: name, email, external identifiers, and any other values intended to identify a person.
- All other domains (e.g., reporting read-models, skills/competency facts, programme progress, monitoring summaries) MUST store only
user_idreferences and resolve display identity just-in-time via the Users read path (DB join / lookup). - Derived/projection tables MUST NOT copy
users.email,users.name, orusers.metadatainto their own storage. - Exceptions are allowed only with an explicit documented decision (new ADR or feature-spec section) defining:
- exactly what data is duplicated,
- how it will be scrubbed/anonymised/reprojected on Forget User,
- how DSAR export accounts for it.
2) Scope rules: Tenant-wide “Forget User” requires tenant-wide authority
Because Evalium supports multi-org silos, Compliance actions must respect scope boundaries:
- Tenant Owner / Global Admin (tenant-wide scope) MAY initiate “Forget User (Hybrid Scrub)” for a subject user.
- Org Admin MUST NOT be able to initiate tenant-wide Forget User.
- Org Admin MAY only perform Unlink From Org actions:
- revoke org-specific role bindings / access
- remove the user from org-specific group memberships
- leave the user entity and other org data intact
This prevents an Org Admin from affecting records that belong to other org units.
3) Compliance operations are asynchronous and durable
DSAR export and Forget User may touch many tables and artefacts and MUST be processed asynchronously.
- The API MUST create a
privacy_jobsrecord describing the request (type, subject, actor, scope, legal basis text, status). - A background worker MUST pick up the job and execute it.
- The job execution MUST be idempotent and safe to retry.
- The implementation SHOULD use a Transactional Outbox pattern for durability and to avoid “job created but not processed” failure modes.
4) A dedicated Compliance Ledger is the legal evidence source of truth
Operational audit logs are necessary but not sufficient for compliance proof.
Evalium MUST introduce a dedicated compliance_ledger table to record completed compliance actions:
job_id,tenant_id,scope(tenant/org),actor_user_id,subject_user_idaction_type(EXPORT / FORGET / RESTRICT / UNLINK)legal_basis_text(user-provided rationale)artifact_hash(hash of generated export/receipt artefacts)created_at
The Compliance Ledger MUST be the canonical source for:
- “What compliance action was performed?”
- “By whom, when, for whom, under what stated basis?”
- “Which artefacts were produced, and can we prove they weren’t altered?”
5) Compliance Receipt artefacts are first-class outputs
Upon completion of Forget User (and optionally Export), the system MUST generate a “Compliance Receipt” artefact (PDF or HTML) that summarises:
- date/time completed
- actor
- subject reference (redacted if applicable)
- scope (tenant/org)
- a domain-level summary of what was redacted, pseudonymized/retained, and deleted
- artefact hashes + job id
This receipt is intended to reduce customer anxiety and support audits without involving Evalium support.
6) Derived Read-Models / Projections (Reporting, Skills, Search Indexes)
Evalium will introduce derived storage (read-models / projections) for performance and UX (e.g., reporting flats, question health rollups, skills evidence facts).
These stores MUST comply with the Hybrid Scrub model:
- No direct identifiers may be stored in derived tables (see Denormalisation Rule).
- Derived tables MAY store
user_idreferences only. - Derived tables that contain free-text fields or unstructured blobs/artefacts MUST be treated as unstructured data:
- delete artefacts, and/or scrub fields to
[Redacted]as required.
- delete artefacts, and/or scrub fields to
- Aggregate rollups should avoid identifiers entirely wherever feasible.
- If a derived store contains any additional subject-linked fields beyond
user_id(e.g., demographic snapshots, external IDs, cached display names), it MUST be treated as “evidence-like” and MUST either:- be scrubbed/anonymised in the Forget User workflow, or
- be deleted and reprojected without the subject’s data.
If legal hold blocks deletion/scrub, derived store operations MUST also be blocked and recorded as such in receipts/ledger entries.
Consequences
Benefits
- Preserves immutable, audit-proof reporting while handling privacy requests safely.
- Minimises unstructured PII risk by deleting/redacting the most dangerous data categories.
- Prevents cross-org leakage and destructive actions by restricting Forget User to tenant-wide authority.
- Creates strong, exportable evidence for customers (Compliance Receipt + Compliance Ledger).
- Establishes a clear PII boundary that reduces future compliance drift as new features (reporting/skills) add projections.
- Fits Evalium’s existing discipline: scoped TxManager transactions, RLS enforcement, and background worker patterns.
Trade-offs / Costs
- Requires additional schema:
privacy_jobs,compliance_ledger, and (if used) outbox tables. - Requires a worker loop and operational monitoring for job processing.
- Requires careful domain classification so fields are correctly treated as structured vs unstructured PII-bearing data.
- The “no denormalised identifiers” rule may increase reliance on joins/lookups at read-time; massive exports should be asynchronous if needed.
Implementation Notes (Non-Normative)
-
Add a new
privacyservice module responsible for:- job creation/validation
- job execution orchestration
- artefact generation (export zip, receipt)
- writing compliance ledger entries
-
Job execution MUST follow TxManager + RLS scope semantics and MUST NOT bypass RLS by using direct DB queries.
-
Multi-table flows MUST follow canonical lock ordering rules and may require new lock helpers if jobs touch multiple clusters.
-
“Export” should default to excluding raw evidence blobs unless explicitly enabled, to prevent accidental third-party disclosure in DSAR workflows.
-
The Compliance Centre UI should use progressive disclosure:
- “Unlink from org” for Org Admins
- “Forget user (Hybrid Scrub)” for tenant-wide admins only, with explicit warnings and a generated receipt.