teleo-infrastructure/docs/phase1b/agent-identity-router-spec.md
2026-05-29 14:00:13 +02:00

338 lines
9.9 KiB
Markdown

# Phase 1b Child Spec: Agent Identity Router
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
The router decides which Hermes agent identity should review a `decision-engine` KB PR. It must route by agent ownership, with file paths as strong evidence but not the only source of truth.
## Goal
Implement a pure, deterministic, evidence-bearing route scorer that returns one or two required reviewer agents for a PR.
## Non-Goals
- Do not call paid LLMs for routing.
- Do not post PR comments.
- Do not mutate pipeline DB state.
- Do not deploy to VPS.
- Do not implement general user-input routing outside PR evaluation.
## Current Implementation Audit
Current relevant code:
- `lib/domains.py` contains `DOMAIN_AGENT_MAP`, `agent_for_domain`, `detect_domain_from_diff`, and `detect_domain_from_branch`.
- `lib/agent_routing.py` now owns the Phase 1b identity-scored route contract.
- The obsolete local `DomainRoute` folder-first draft and its draft tests were removed before this branch was committed.
- Cross-domain PRs now require the top 2 routed agents locally, with `route_kind="escalated"` when more than two agents scored.
Existing implementation truth:
- The repo already has domain detection that can be reused for path signals.
- The new route tests cover six primary agents, broadened ownership domains, top-2 cross-domain routing, fallback, and deterministic repeat behavior.
- The existing map includes adjacent domains such as `mechanisms`, `living-capital`, `living-agents`, `critical-systems`, `collective-intelligence`, `teleological-economics`, and `cultural-dynamics`.
- The product owner clarified that Phase 1b should use agent identities to route, not only folder names.
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Umbrella source of truth. | Reuse. |
| `docs/queue.md` | Notes `ai-alignment` domain evolution. | Reuse as a signal for Theseus ownership. |
| `docs/ARCHITECTURE.md` | Describes eval stage shape. | Context only. |
## Goal-Vs-Repo-Truth Diff
Goal:
- Return `AgentRoute` with `primary_agent`, `required_agents`, `route_kind`, `scores`, and `evidence`.
- Cap cross-domain routes at top 2 agents.
- Treat folders as evidence, not the complete classifier.
- Be testable without network, DB, GitHub, or LLM calls.
Repo truth:
- Existing classifier returns one folder-domain string or `None`.
- No scores, evidence, or top-2 agent set exist.
- Existing tests do not cover identity-broadened ownership.
## Completion Percent And Remaining Delta
Current completion on this branch: 100 percent for local route logic, 0 percent for staging route calibration.
Remaining delta:
1. Review the route weights against real recent `decision-engine` PRs.
2. Calibrate ambiguous keyword cases from staging evidence.
3. Decide whether escalated routes should remain top-2 total or become Leo plus top-2 later.
## Closure, Endpoint, And Deployment Truth
Local closure:
- Route tests pass.
- No network or DB dependency exists in route tests.
Staging closure:
- Staging proof artifact records route scores and evidence for seven sandbox PRs.
Production closure:
- Live PR audit rows show route evidence and required agents.
This child spec alone cannot prove staging or production behavior.
## Critical Assumptions And Invalidators
Assumptions:
- `decision-engine` file layout is close enough to current local clone for path signals to apply.
- Agent identity ownership from m3taversal is authoritative.
- Top-2 cap is acceptable for cross-domain cases.
Invalidators:
- Product owner changes cross-domain rule from top 2 to all touched agents.
- Agent ownership boundaries change materially.
- Production PR metadata lacks branch or changed-file data.
## State And Truth Contract
Route output schema:
```python
AgentRoute(
primary_agent="Rio",
required_agents=("Rio",),
route_kind="single",
scores={"Leo": 0, "Theseus": 1, "Rio": 9, "Vida": 0, "Clay": 0, "Astra": 0},
evidence=[
{"agent": "Rio", "signal": "path", "weight": 8, "value": "domains/internet-finance/foo.md"}
],
fallback=False,
)
```
`route_kind` values:
- `single`
- `multi`
- `fallback`
- `escalated`
`required_agents` must never contain more than two agents in Phase 1b.
## Measurement Contract
Required route fixture cases:
| Fixture | Expected |
| --- | --- |
| `domains/grand-strategy/foo.md` | Leo |
| `domains/ai-alignment/foo.md` | Theseus |
| `domains/internet-finance/foo.md` | Rio |
| `domains/health/foo.md` | Vida |
| `domains/entertainment/foo.md` | Clay |
| `domains/space-development/foo.md` | Astra |
| `domains/energy/foo.md` | Astra |
| `domains/robotics/foo.md` | Astra |
| `domains/manufacturing/foo.md` | Astra |
| `core/living-capital/foo.md` | Rio |
| `core/living-agents/foo.md` | Theseus |
| `foundations/cultural-dynamics/foo.md` | Clay |
| AI plus x402 diff | Theseus and Rio |
| collective AI goals diff | Leo and Theseus |
Minimum quality metrics:
- `route_fixture_pass_rate = 100 percent`
- `fallback_count = 0` for known fixtures
- deterministic repeat count: same input returns same result 100 times
## Backend Work Required
Owned files:
- `lib/agent_routing.py`
- `lib/domains.py`
- `tests/test_agent_routing.py`
Implementation steps:
1. Move new identity routing into `lib/agent_routing.py`.
2. Keep `lib/domains.py` as compatibility for domain-oriented callers.
3. Define `AGENT_ORDER = ("Leo", "Theseus", "Rio", "Vida", "Clay", "Astra")`.
4. Define identity signals per agent.
5. Add path signal extraction for `domains`, `entities`, `core`, `foundations`, and `agents`.
6. Add branch prefix signal extraction.
7. Add capped keyword scoring from filenames and diff text.
8. Add top-2 selection rule.
9. Add fallback to Leo.
10. Add tests.
Forbidden files:
- `lib/evaluate.py`
- `lib/llm.py`
- deploy scripts
- secrets or runtime config outside route feature flag wiring
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
The router itself has no user-visible UI. Its behavior becomes visible through audit logs, PR comment reviewer selection, and proof artifacts.
Example:
```text
input: domains/internet-finance/x402-agent-payments.md
output: required_agents = ["Rio"]
```
Cross-domain example:
```text
input: ai systems claim plus x402 payment claim
output: required_agents = ["Theseus", "Rio"]
```
## Validation And Test Matrix
Commands:
```bash
python3 -m pytest tests/test_agent_routing.py
python3 -m ruff check lib/agent_routing.py lib/domains.py tests/test_agent_routing.py
git diff --check
```
Test classes:
- primary ownership routes
- broadened ownership routes
- branch fallback routes
- keyword routes
- top-2 cross-domain routes
- fallback routes
- deterministic tie-breaking
- compatibility wrapper behavior
## CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Route tests pass locally.
- No production config defaults change.
- No network dependency enters route tests.
Before staging:
- Eval integration spec consumes the route result without modifying route internals.
Before production:
- Route evidence appears in staging proof artifact.
## Independent CLI Audit Contract
Reviewer commands:
```bash
git diff -- lib/agent_routing.py lib/domains.py tests/test_agent_routing.py
python3 -m pytest tests/test_agent_routing.py
```
Reviewer checks:
- Route function is pure.
- Scores are explainable.
- Top-2 cap is enforced.
- Folder paths are not the only signal.
- Old callers still work or have a clear migration path.
## Outside-The-Box Fix Paths
If keyword scoring is noisy:
- Disable diff keyword scoring and use path plus branch only.
- Use LLM classifier in shadow mode only.
- Add explicit PR label or frontmatter hint later.
If identity boundaries are ambiguous:
- Prefer top-2 over fallback when two agents have meaningful scores.
- Log route evidence for later calibration.
## Maintenance Capture
Beneficial now:
- Keep route logic out of `lib/evaluate.py`.
- Keep compatibility wrappers narrow.
Avoid now:
- Large domain taxonomy rewrite.
- Dashboard UI changes.
- Paid classifier calls.
## Parallelization And Fanout
Classification: local_owner.
Do not fan out implementation. This module is a root contract consumed by eval integration.
Worker-ready prompt:
```text
implement the phase 1b agent identity router in teleo-infrastructure. own lib/agent_routing.py, lib/domains.py compatibility wrappers, and route tests only. make the route function pure, deterministic, evidence-bearing, and capped at top 2 required agents. do not touch eval integration or deploy code.
```
## Acceptance Criteria
- All required route fixtures pass.
- Route result includes primary agent, required agents, route kind, scores, evidence, and fallback status.
- Cross-domain route never requires more than two agents.
- No LLM, network, DB, or GitHub calls occur in the router.
## Readiness And Claim Boundaries
Allowed claim:
- "Agent identity routing is locally implemented and unit-tested."
Forbidden claim:
- "Phase 1b eval is complete."
## Spec Quality Self-Audit
Required headings present:
- Current Implementation Audit: present.
- Goal-Vs-Repo-Truth Diff: present.
- Completion Percent And Remaining Delta: present.
- Closure, Endpoint, And Deployment Truth: present.
- Critical Assumptions And Invalidators: present.
- State And Truth Contract: present.
- Measurement Contract: present.
- Backend Work Required: present.
- Frontend Work Required: present.
- Expected Runtime And User-Visible Behavior: present.
- Validation And Test Matrix: present.
- CI/CD, Release, And Pre-Push Gate Contract: present.
- Independent CLI Audit Contract: present.
- Outside-The-Box Fix Paths: present.
- Maintenance Capture: present.
- Parallelization And Fanout: present.
## Assistant-Added Caveats
This child spec intentionally keeps routing deterministic and no-spend. That may be less semantically smart than an LLM classifier, but it is the right first implementation for Phase 1b because it is testable, cheap, and auditable.