Skip to main content

Taxonomy Engine Evolution Proposal

Purpose

Define a backend-first evolution path for a single taxonomy engine that supports:

  • domain-scoped term partitions (content, skills, subjects, identity)
  • optional parent/child hierarchy (one-to-many)
  • controlled cross-domain mapping where content taxonomy can contribute to user skill fulfilment

This proposal preserves Evalium invariants: RLS scoping, immutable execution truth, snapshot-based reconstruction, and append-only outcomes.

Direct answer on reuse:

  • Yes, we should repurpose the existing skills mapping model for the new cross-domain behavior rather than introduce a second mapping engine.

Why this approach fits Evalium best

  1. One engine keeps the platform coherent.
  • Shared infrastructure (RLS, capabilities, API patterns, tooling) is implemented once.
  • Avoids duplicate taxonomy systems per module.
  1. Domain partitions preserve clarity.
  • content labels authoring assets.
  • skills labels competency model outcomes.
  • subjects/identity remain separate and safe.
  • This avoids semantic leakage while still allowing controlled mappings.
  1. Relation-first hierarchy preserves flexibility.
  • A relation table supports strict trees now and polyhierarchy later without schema rewrites.
  • Enables future relation types (broader/narrower/equivalent) without changing term identity.
  1. Cross-domain fulfilment is explicit and auditable.
  • Content terms do not become skills directly.
  • Versioned mapping rules define how content evidence contributes to skills.
  • Every computed skill result can show source term(s), rule version, and submission provenance.
  1. Backend-first sequencing is maintained.
  • Hierarchy + mapping primitives can be hardened before frontend taxonomy UX is activated.
  • Frontend remains intentionally deferred until contracts stabilize.

Core model

1) Term identity (existing direction)

  • taxonomy_terms.id remains the stable identity.
  • label/slug are mutable metadata.
  • Terms are retired, not hard-deleted, when referenced.

2) Domain-scoped hierarchy (proposed)

Use a relation table instead of a fixed parent_id column:

  • taxonomy_term_relations
    • tenant_id
    • domain
    • parent_term_id
    • child_term_id
    • relation_type (default: broader_narrower)
    • sort_order
    • timestamps / actor metadata

Required constraints:

  • same tenant for parent and child
  • same domain for parent and child
  • no self-reference
  • no duplicate pair for same relation type
  • cycle prevention (reject insert creating ancestor loop)

Operational mode:

  • default to strict tree per facet/domain (one parent per child) if desired
  • allow polyhierarchy later by relaxing uniqueness constraint on child

Hierarchy storage and query policy:

  • taxonomy_term_relations is the canonical source of truth (adjacency model).
  • Add a closure projection only when descendant query load justifies it.
  • If enabled, the closure table is derived data and rebuilt/reconciled from adjacency truth.

3) Cross-domain fulfilment mapping (content -> skills)

Do not mix domains directly in assignment tables.

Canonical implementation path:

  • Repurpose existing skill_mapping_set_versions + skill_mapping_rules.
  • Extend selector contract so rules can match content taxonomy term IDs.
  • Keep skill projection and provenance in the existing skills engine.

Rule shape expectation (conceptual):

  • source_domain: content
  • source_term_ids: list of content term IDs (optional include-descendants flag)
  • target_skill_id: skill outcome target
  • contribution strategy/weight/threshold fields as already defined by the skills model

Explicit non-goal for now:

  • Do not introduce a standalone taxonomy_domain_mappings engine in parallel.

Migration note (if legacy bridge rows exist):

  • Migrate legacy mapping rows into versioned skill mapping rules.
  • Keep old data read-only during migration, then deprecate.

Execution-time semantics (non-negotiable)

  1. authoring time
  • Questions/evaluations are tagged with content terms only.
  1. Snapshot freeze
  • Submission snapshot freezes relevant taxonomy term IDs.
  • Optional display labels/slugs may be included for explain rendering, but IDs are canonical.
  • Skills fulfilment projections must persist mapping_set_version_id + mapping_rule_id provenance.
  1. Projection / fulfilment
  • Worker resolves mapping-set version + content-term evidence.
  • Emits skill evidence facts for the user/subject.
  • Records mapping provenance and source submission references.
  1. Reconstruction
  • Historical skill outcomes remain explainable even if taxonomy labels/hierarchy later change.

Guardrails

  1. Domain-gated assignment
  • Content resources can only attach content terms.
  • Skills resources can only attach skills terms.
  1. Cross-domain mapping gate
  • Only explicit mapping rules may bridge content to skills.
  • No implicit inheritance between domains.
  1. Retire, do not destroy
  • Terms used in snapshots/projections are never hard-deleted.
  1. Policy + auth
  • Keep taxonomy.read and taxonomy.manage.
  • Add optional domain policy constraints later if needed (for delegated admin models).
  1. Performance
  • For hierarchy queries, use adjacency + recursive reads initially.
  • Introduce closure/materialized path only when thresholds are exceeded.

Performance trigger thresholds (start point):

  • taxonomy descendant query p95 > 120ms for two consecutive releases, or
  • descendant queries account for > 20% of taxonomy read volume, or
  • average active hierarchy depth exceeds 4 with sustained growth.

Proposed phased implementation

Phase 1: Hierarchy primitives

  • add taxonomy_term_relations
  • enforce same-domain/same-tenant/no-cycle constraints
  • expose read APIs for parent/children lineage

Phase 2: Inventory integration

  • add term-hierarchy-aware filtering support (terms_any/terms_all, optional include descendants)
  • finalize snapshot taxonomy freeze contract

Phase 3: Skill fulfilment bridge

  • support content-term selectors in skill mapping rules
  • write provenance-rich skill evidence facts (source terms + mapping version)
  • add explain endpoint fields for mapping trace

Phase 4: Frontend activation

  • taxonomy drawers/trees
  • mapping management UX
  • impact previews before publishing mapping changes

Phase acceptance criteria (non-negotiable)

Phase 1 done when

  • cycle prevention is enforced
  • same-tenant and same-domain relation checks are enforced
  • strict-tree mode is enforceable per facet/domain

Phase 2 done when

  • inventory filters can resolve term descendants deterministically
  • snapshot taxonomy freeze contract is documented and covered by tests

Phase 3 done when

  • content term selectors are supported in versioned skill mapping rules
  • every skill fulfilment fact can trace to submission + mapping version + mapping rule
  • no implicit cross-domain fulfilment path exists outside mapping rules

Phase 4 done when

  • frontend taxonomy UIs consume only backend-validated contracts
  • impact preview is backend-derived and version-aware

What this avoids

  • separate taxonomy engines per domain
  • hidden coupling where content tags silently mutate skill outcomes
  • brittle one-parent-only schema that blocks future use cases
  • front-end-driven taxonomy semantics before backend invariants are stable

Recommendation

Adopt this proposal as the canonical backend direction:

  • one taxonomy engine
  • domain partitions
  • relation-based hierarchy
  • explicit versioned cross-domain mapping (content -> skills) for user skill fulfilment

This gives you flexibility for future product growth while preserving the defensibility and backend-first discipline already established in Evalium.