Taxonomy Engine Evolution Proposal
Purpose
Define a backend-first evolution path for a single taxonomy engine that supports:
- domain-scoped term partitions (
content,skills,subjects,identity) - optional parent/child hierarchy (one-to-many)
- controlled cross-domain mapping where content taxonomy can contribute to user skill fulfilment
This proposal preserves Evalium invariants: RLS scoping, immutable execution truth, snapshot-based reconstruction, and append-only outcomes.
Direct answer on reuse:
- Yes, we should repurpose the existing skills mapping model for the new cross-domain behavior rather than introduce a second mapping engine.
Why this approach fits Evalium best
- One engine keeps the platform coherent.
- Shared infrastructure (RLS, capabilities, API patterns, tooling) is implemented once.
- Avoids duplicate taxonomy systems per module.
- Domain partitions preserve clarity.
contentlabels authoring assets.skillslabels competency model outcomes.subjects/identityremain separate and safe.- This avoids semantic leakage while still allowing controlled mappings.
- Relation-first hierarchy preserves flexibility.
- A relation table supports strict trees now and polyhierarchy later without schema rewrites.
- Enables future relation types (broader/narrower/equivalent) without changing term identity.
- Cross-domain fulfilment is explicit and auditable.
- Content terms do not become skills directly.
- Versioned mapping rules define how content evidence contributes to skills.
- Every computed skill result can show source term(s), rule version, and submission provenance.
- Backend-first sequencing is maintained.
- Hierarchy + mapping primitives can be hardened before frontend taxonomy UX is activated.
- Frontend remains intentionally deferred until contracts stabilize.
Core model
1) Term identity (existing direction)
taxonomy_terms.idremains the stable identity.label/slugare mutable metadata.- Terms are retired, not hard-deleted, when referenced.
2) Domain-scoped hierarchy (proposed)
Use a relation table instead of a fixed parent_id column:
taxonomy_term_relationstenant_iddomainparent_term_idchild_term_idrelation_type(default:broader_narrower)sort_order- timestamps / actor metadata
Required constraints:
- same tenant for parent and child
- same domain for parent and child
- no self-reference
- no duplicate pair for same relation type
- cycle prevention (reject insert creating ancestor loop)
Operational mode:
- default to strict tree per facet/domain (one parent per child) if desired
- allow polyhierarchy later by relaxing uniqueness constraint on child
Hierarchy storage and query policy:
taxonomy_term_relationsis the canonical source of truth (adjacency model).- Add a closure projection only when descendant query load justifies it.
- If enabled, the closure table is derived data and rebuilt/reconciled from adjacency truth.
3) Cross-domain fulfilment mapping (content -> skills)
Do not mix domains directly in assignment tables.
Canonical implementation path:
- Repurpose existing
skill_mapping_set_versions+skill_mapping_rules. - Extend selector contract so rules can match
contenttaxonomy term IDs. - Keep skill projection and provenance in the existing skills engine.
Rule shape expectation (conceptual):
source_domain:contentsource_term_ids: list of content term IDs (optional include-descendants flag)target_skill_id: skill outcome target- contribution strategy/weight/threshold fields as already defined by the skills model
Explicit non-goal for now:
- Do not introduce a standalone
taxonomy_domain_mappingsengine in parallel.
Migration note (if legacy bridge rows exist):
- Migrate legacy mapping rows into versioned skill mapping rules.
- Keep old data read-only during migration, then deprecate.
Execution-time semantics (non-negotiable)
- authoring time
- Questions/evaluations are tagged with
contentterms only.
- Snapshot freeze
- Submission snapshot freezes relevant taxonomy term IDs.
- Optional display labels/slugs may be included for explain rendering, but IDs are canonical.
- Skills fulfilment projections must persist
mapping_set_version_id+mapping_rule_idprovenance.
- Projection / fulfilment
- Worker resolves mapping-set version + content-term evidence.
- Emits skill evidence facts for the user/subject.
- Records mapping provenance and source submission references.
- Reconstruction
- Historical skill outcomes remain explainable even if taxonomy labels/hierarchy later change.
Guardrails
- Domain-gated assignment
- Content resources can only attach
contentterms. - Skills resources can only attach
skillsterms.
- Cross-domain mapping gate
- Only explicit mapping rules may bridge
contenttoskills. - No implicit inheritance between domains.
- Retire, do not destroy
- Terms used in snapshots/projections are never hard-deleted.
- Policy + auth
- Keep
taxonomy.readandtaxonomy.manage. - Add optional domain policy constraints later if needed (for delegated admin models).
- Performance
- For hierarchy queries, use adjacency + recursive reads initially.
- Introduce closure/materialized path only when thresholds are exceeded.
Performance trigger thresholds (start point):
- taxonomy descendant query p95 > 120ms for two consecutive releases, or
- descendant queries account for > 20% of taxonomy read volume, or
- average active hierarchy depth exceeds 4 with sustained growth.
Proposed phased implementation
Phase 1: Hierarchy primitives
- add
taxonomy_term_relations - enforce same-domain/same-tenant/no-cycle constraints
- expose read APIs for parent/children lineage
Phase 2: Inventory integration
- add term-hierarchy-aware filtering support (
terms_any/terms_all, optional include descendants) - finalize snapshot taxonomy freeze contract
Phase 3: Skill fulfilment bridge
- support content-term selectors in skill mapping rules
- write provenance-rich skill evidence facts (source terms + mapping version)
- add explain endpoint fields for mapping trace
Phase 4: Frontend activation
- taxonomy drawers/trees
- mapping management UX
- impact previews before publishing mapping changes
Phase acceptance criteria (non-negotiable)
Phase 1 done when
- cycle prevention is enforced
- same-tenant and same-domain relation checks are enforced
- strict-tree mode is enforceable per facet/domain
Phase 2 done when
- inventory filters can resolve term descendants deterministically
- snapshot taxonomy freeze contract is documented and covered by tests
Phase 3 done when
contentterm selectors are supported in versioned skill mapping rules- every skill fulfilment fact can trace to submission + mapping version + mapping rule
- no implicit cross-domain fulfilment path exists outside mapping rules
Phase 4 done when
- frontend taxonomy UIs consume only backend-validated contracts
- impact preview is backend-derived and version-aware
What this avoids
- separate taxonomy engines per domain
- hidden coupling where content tags silently mutate skill outcomes
- brittle one-parent-only schema that blocks future use cases
- front-end-driven taxonomy semantics before backend invariants are stable
Recommendation
Adopt this proposal as the canonical backend direction:
- one taxonomy engine
- domain partitions
- relation-based hierarchy
- explicit versioned cross-domain mapping (
content->skills) for user skill fulfilment
This gives you flexibility for future product growth while preserving the defensibility and backend-first discipline already established in Evalium.