twentyOne2x 7390e1e843 Implement phase 1b agent routing

2026-05-29 14:00:13 +02:00

9.9 KiB

Raw Blame History

Phase 1b Child Spec: Agent Identity Router

Created: 2026-05-29 Status: active draft Parent spec: docs/phase1b-agent-routing-spec.md

Product Outcome Contract

The router decides which Hermes agent identity should review a decision-engine KB PR. It must route by agent ownership, with file paths as strong evidence but not the only source of truth.

Goal

Implement a pure, deterministic, evidence-bearing route scorer that returns one or two required reviewer agents for a PR.

Non-Goals

Do not call paid LLMs for routing.
Do not post PR comments.
Do not mutate pipeline DB state.
Do not deploy to VPS.
Do not implement general user-input routing outside PR evaluation.

Current Implementation Audit

Current relevant code:

lib/domains.py contains DOMAIN_AGENT_MAP, agent_for_domain, detect_domain_from_diff, and detect_domain_from_branch.
lib/agent_routing.py now owns the Phase 1b identity-scored route contract.
The obsolete local DomainRoute folder-first draft and its draft tests were removed before this branch was committed.
Cross-domain PRs now require the top 2 routed agents locally, with route_kind="escalated" when more than two agents scored.

Existing implementation truth:

The repo already has domain detection that can be reused for path signals.
The new route tests cover six primary agents, broadened ownership domains, top-2 cross-domain routing, fallback, and deterministic repeat behavior.
The existing map includes adjacent domains such as mechanisms, living-capital, living-agents, critical-systems, collective-intelligence, teleological-economics, and cultural-dynamics.
The product owner clarified that Phase 1b should use agent identities to route, not only folder names.

Existing-Spec Inventory

Existing doc	Relevance	Decision
`docs/phase1b-agent-routing-spec.md`	Umbrella source of truth.	Reuse.
`docs/queue.md`	Notes `ai-alignment` domain evolution.	Reuse as a signal for Theseus ownership.
`docs/ARCHITECTURE.md`	Describes eval stage shape.	Context only.

Goal-Vs-Repo-Truth Diff

Goal:

Return AgentRoute with primary_agent, required_agents, route_kind, scores, and evidence.
Cap cross-domain routes at top 2 agents.
Treat folders as evidence, not the complete classifier.
Be testable without network, DB, GitHub, or LLM calls.

Repo truth:

Existing classifier returns one folder-domain string or None.
No scores, evidence, or top-2 agent set exist.
Existing tests do not cover identity-broadened ownership.

Completion Percent And Remaining Delta

Current completion on this branch: 100 percent for local route logic, 0 percent for staging route calibration.

Remaining delta:

Review the route weights against real recent decision-engine PRs.
Calibrate ambiguous keyword cases from staging evidence.
Decide whether escalated routes should remain top-2 total or become Leo plus top-2 later.

Closure, Endpoint, And Deployment Truth

Local closure:

Route tests pass.
No network or DB dependency exists in route tests.

Staging closure:

Staging proof artifact records route scores and evidence for seven sandbox PRs.

Production closure:

Live PR audit rows show route evidence and required agents.

This child spec alone cannot prove staging or production behavior.

Critical Assumptions And Invalidators

Assumptions:

decision-engine file layout is close enough to current local clone for path signals to apply.
Agent identity ownership from m3taversal is authoritative.
Top-2 cap is acceptable for cross-domain cases.

Invalidators:

Product owner changes cross-domain rule from top 2 to all touched agents.
Agent ownership boundaries change materially.
Production PR metadata lacks branch or changed-file data.

State And Truth Contract

Route output schema:

AgentRoute(
    primary_agent="Rio",
    required_agents=("Rio",),
    route_kind="single",
    scores={"Leo": 0, "Theseus": 1, "Rio": 9, "Vida": 0, "Clay": 0, "Astra": 0},
    evidence=[
        {"agent": "Rio", "signal": "path", "weight": 8, "value": "domains/internet-finance/foo.md"}
    ],
    fallback=False,
)

route_kind values:

single
multi
fallback
escalated

required_agents must never contain more than two agents in Phase 1b.

Measurement Contract

Required route fixture cases:

Fixture	Expected
`domains/grand-strategy/foo.md`	Leo
`domains/ai-alignment/foo.md`	Theseus
`domains/internet-finance/foo.md`	Rio
`domains/health/foo.md`	Vida
`domains/entertainment/foo.md`	Clay
`domains/space-development/foo.md`	Astra
`domains/energy/foo.md`	Astra
`domains/robotics/foo.md`	Astra
`domains/manufacturing/foo.md`	Astra
`core/living-capital/foo.md`	Rio
`core/living-agents/foo.md`	Theseus
`foundations/cultural-dynamics/foo.md`	Clay
AI plus x402 diff	Theseus and Rio
collective AI goals diff	Leo and Theseus

Minimum quality metrics:

route_fixture_pass_rate = 100 percent
fallback_count = 0 for known fixtures
deterministic repeat count: same input returns same result 100 times

Backend Work Required

Owned files:

lib/agent_routing.py
lib/domains.py
tests/test_agent_routing.py

Implementation steps:

Move new identity routing into lib/agent_routing.py.
Keep lib/domains.py as compatibility for domain-oriented callers.
Define AGENT_ORDER = ("Leo", "Theseus", "Rio", "Vida", "Clay", "Astra").
Define identity signals per agent.
Add path signal extraction for domains, entities, core, foundations, and agents.
Add branch prefix signal extraction.
Add capped keyword scoring from filenames and diff text.
Add top-2 selection rule.
Add fallback to Leo.
Add tests.

Forbidden files:

lib/evaluate.py
lib/llm.py
deploy scripts
secrets or runtime config outside route feature flag wiring

Frontend Work Required

None.

Expected Runtime And User-Visible Behavior

The router itself has no user-visible UI. Its behavior becomes visible through audit logs, PR comment reviewer selection, and proof artifacts.

Example:

input: domains/internet-finance/x402-agent-payments.md
output: required_agents = ["Rio"]

Cross-domain example:

input: ai systems claim plus x402 payment claim
output: required_agents = ["Theseus", "Rio"]

Validation And Test Matrix

Commands:

python3 -m pytest tests/test_agent_routing.py
python3 -m ruff check lib/agent_routing.py lib/domains.py tests/test_agent_routing.py
git diff --check

Test classes:

primary ownership routes
broadened ownership routes
branch fallback routes
keyword routes
top-2 cross-domain routes
fallback routes
deterministic tie-breaking
compatibility wrapper behavior

CI/CD, Release, And Pre-Push Gate Contract

Before PR:

Route tests pass locally.
No production config defaults change.
No network dependency enters route tests.

Before staging:

Eval integration spec consumes the route result without modifying route internals.

Before production:

Route evidence appears in staging proof artifact.

Independent CLI Audit Contract

Reviewer commands:

git diff -- lib/agent_routing.py lib/domains.py tests/test_agent_routing.py
python3 -m pytest tests/test_agent_routing.py

Reviewer checks:

Route function is pure.
Scores are explainable.
Top-2 cap is enforced.
Folder paths are not the only signal.
Old callers still work or have a clear migration path.

Outside-The-Box Fix Paths

If keyword scoring is noisy:

Disable diff keyword scoring and use path plus branch only.
Use LLM classifier in shadow mode only.
Add explicit PR label or frontmatter hint later.

If identity boundaries are ambiguous:

Prefer top-2 over fallback when two agents have meaningful scores.
Log route evidence for later calibration.

Maintenance Capture

Beneficial now:

Keep route logic out of lib/evaluate.py.
Keep compatibility wrappers narrow.

Avoid now:

Large domain taxonomy rewrite.
Dashboard UI changes.
Paid classifier calls.

Parallelization And Fanout

Classification: local_owner.

Do not fan out implementation. This module is a root contract consumed by eval integration.

Worker-ready prompt:

implement the phase 1b agent identity router in teleo-infrastructure. own lib/agent_routing.py, lib/domains.py compatibility wrappers, and route tests only. make the route function pure, deterministic, evidence-bearing, and capped at top 2 required agents. do not touch eval integration or deploy code.

Acceptance Criteria

All required route fixtures pass.
Route result includes primary agent, required agents, route kind, scores, evidence, and fallback status.
Cross-domain route never requires more than two agents.
No LLM, network, DB, or GitHub calls occur in the router.

Readiness And Claim Boundaries

Allowed claim:

"Agent identity routing is locally implemented and unit-tested."

Forbidden claim:

"Phase 1b eval is complete."

Spec Quality Self-Audit

Required headings present:

Current Implementation Audit: present.
Goal-Vs-Repo-Truth Diff: present.
Completion Percent And Remaining Delta: present.
Closure, Endpoint, And Deployment Truth: present.
Critical Assumptions And Invalidators: present.
State And Truth Contract: present.
Measurement Contract: present.
Backend Work Required: present.
Frontend Work Required: present.
Expected Runtime And User-Visible Behavior: present.
Validation And Test Matrix: present.
CI/CD, Release, And Pre-Push Gate Contract: present.
Independent CLI Audit Contract: present.
Outside-The-Box Fix Paths: present.
Maintenance Capture: present.
Parallelization And Fanout: present.

Assistant-Added Caveats

This child spec intentionally keeps routing deterministic and no-spend. That may be less semantically smart than an LLM classifier, but it is the right first implementation for Phase 1b because it is testable, cheap, and auditable.

9.9 KiB Raw Blame History