Taxonomy Engine Evolution Proposal

Purpose

Define a backend-first evolution path for a single taxonomy engine that supports:

domain-scoped term partitions (content, skills, subjects, identity)
optional parent/child hierarchy (one-to-many)
controlled cross-domain mapping where content taxonomy can contribute to user skill fulfilment

This proposal preserves Evalium invariants: RLS scoping, immutable execution truth, snapshot-based reconstruction, and append-only outcomes.

Direct answer on reuse:

Yes, we should repurpose the existing skills mapping model for the new cross-domain behavior rather than introduce a second mapping engine.

Why this approach fits Evalium best

One engine keeps the platform coherent.

Shared infrastructure (RLS, capabilities, API patterns, tooling) is implemented once.
Avoids duplicate taxonomy systems per module.

Domain partitions preserve clarity.

content labels authoring assets.
skills labels competency model outcomes.
subjects/identity remain separate and safe.
This avoids semantic leakage while still allowing controlled mappings.

Relation-first hierarchy preserves flexibility.

A relation table supports strict trees now and polyhierarchy later without schema rewrites.
Enables future relation types (broader/narrower/equivalent) without changing term identity.

Cross-domain fulfilment is explicit and auditable.

Content terms do not become skills directly.
Versioned mapping rules define how content evidence contributes to skills.
Every computed skill result can show source term(s), rule version, and submission provenance.

Backend-first sequencing is maintained.

Hierarchy + mapping primitives can be hardened before frontend taxonomy UX is activated.
Frontend remains intentionally deferred until contracts stabilize.

Core model

1) Term identity (existing direction)

taxonomy_terms.id remains the stable identity.
label/slug are mutable metadata.
Terms are retired, not hard-deleted, when referenced.

2) Domain-scoped hierarchy (proposed)

Use a relation table instead of a fixed parent_id column:

taxonomy_term_relations
- tenant_id
- domain
- parent_term_id
- child_term_id
- relation_type (default: broader_narrower)
- sort_order
- timestamps / actor metadata

Required constraints:

same tenant for parent and child
same domain for parent and child
no self-reference
no duplicate pair for same relation type
cycle prevention (reject insert creating ancestor loop)

Operational mode:

default to strict tree per facet/domain (one parent per child) if desired
allow polyhierarchy later by relaxing uniqueness constraint on child

Hierarchy storage and query policy:

taxonomy_term_relations is the canonical source of truth (adjacency model).
Add a closure projection only when descendant query load justifies it.
If enabled, the closure table is derived data and rebuilt/reconciled from adjacency truth.

3) Cross-domain fulfilment mapping (content -> skills)

Do not mix domains directly in assignment tables.

Canonical implementation path:

Repurpose existing skill_mapping_set_versions + skill_mapping_rules.
Extend selector contract so rules can match content taxonomy term IDs.
Keep skill projection and provenance in the existing skills engine.

Rule shape expectation (conceptual):

source_domain: content
source_term_ids: list of content term IDs (optional include-descendants flag)
target_skill_id: skill outcome target
contribution strategy/weight/threshold fields as already defined by the skills model

Explicit non-goal for now:

Do not introduce a standalone taxonomy_domain_mappings engine in parallel.

Migration note (if legacy bridge rows exist):

Migrate legacy mapping rows into versioned skill mapping rules.
Keep old data read-only during migration, then deprecate.

Execution-time semantics (non-negotiable)

authoring time

Questions/evaluations are tagged with content terms only.

Snapshot freeze

Submission snapshot freezes relevant taxonomy term IDs.
Optional display labels/slugs may be included for explain rendering, but IDs are canonical.
Skills fulfilment projections must persist mapping_set_version_id + mapping_rule_id provenance.

Projection / fulfilment

Worker resolves mapping-set version + content-term evidence.
Emits skill evidence facts for the user/subject.
Records mapping provenance and source submission references.

Reconstruction

Historical skill outcomes remain explainable even if taxonomy labels/hierarchy later change.

Guardrails

Domain-gated assignment

Content resources can only attach content terms.
Skills resources can only attach skills terms.

Cross-domain mapping gate

Only explicit mapping rules may bridge content to skills.
No implicit inheritance between domains.

Retire, do not destroy

Terms used in snapshots/projections are never hard-deleted.

Policy + auth

Keep taxonomy.read and taxonomy.manage.
Add optional domain policy constraints later if needed (for delegated admin models).

Performance

For hierarchy queries, use adjacency + recursive reads initially.
Introduce closure/materialized path only when thresholds are exceeded.

Performance trigger thresholds (start point):

taxonomy descendant query p95 > 120ms for two consecutive releases, or
descendant queries account for > 20% of taxonomy read volume, or
average active hierarchy depth exceeds 4 with sustained growth.

Proposed phased implementation

Phase 1: Hierarchy primitives

add taxonomy_term_relations
enforce same-domain/same-tenant/no-cycle constraints
expose read APIs for parent/children lineage

Phase 2: Inventory integration

add term-hierarchy-aware filtering support (terms_any/terms_all, optional include descendants)
finalize snapshot taxonomy freeze contract

Phase 3: Skill fulfilment bridge

support content-term selectors in skill mapping rules
write provenance-rich skill evidence facts (source terms + mapping version)
add explain endpoint fields for mapping trace

Phase 4: Frontend activation

taxonomy drawers/trees
mapping management UX
impact previews before publishing mapping changes

Phase acceptance criteria (non-negotiable)

Phase 1 done when

cycle prevention is enforced
same-tenant and same-domain relation checks are enforced
strict-tree mode is enforceable per facet/domain

Phase 2 done when

inventory filters can resolve term descendants deterministically
snapshot taxonomy freeze contract is documented and covered by tests

Phase 3 done when

content term selectors are supported in versioned skill mapping rules
every skill fulfilment fact can trace to submission + mapping version + mapping rule
no implicit cross-domain fulfilment path exists outside mapping rules

Phase 4 done when

frontend taxonomy UIs consume only backend-validated contracts
impact preview is backend-derived and version-aware

What this avoids

separate taxonomy engines per domain
hidden coupling where content tags silently mutate skill outcomes
brittle one-parent-only schema that blocks future use cases
front-end-driven taxonomy semantics before backend invariants are stable

Recommendation

Adopt this proposal as the canonical backend direction:

one taxonomy engine
domain partitions
relation-based hierarchy
explicit versioned cross-domain mapping (content -> skills) for user skill fulfilment

This gives you flexibility for future product growth while preserving the defensibility and backend-first discipline already established in Evalium.

Purpose​

Why this approach fits Evalium best​

Core model​

1) Term identity (existing direction)​

2) Domain-scoped hierarchy (proposed)​

3) Cross-domain fulfilment mapping (content -> skills)​

Execution-time semantics (non-negotiable)​

Guardrails​

Proposed phased implementation​

Phase 1: Hierarchy primitives​

Phase 2: Inventory integration​

Phase 3: Skill fulfilment bridge​

Phase 4: Frontend activation​

Phase acceptance criteria (non-negotiable)​

Phase 1 done when​

Phase 2 done when​

Phase 3 done when​

Phase 4 done when​

What this avoids​

Recommendation​