teleo-infrastructure/docs/phase1b/local-review-guide.md
2026-05-29 14:17:28 +02:00

3.8 KiB

Phase 1b Local Review Guide

Status: local-only review artifact Branch: phase1b-agent-routing-local

What This Repo Is

teleo-infrastructure is the pipeline/runtime repo. For Phase 1b, it owns the evaluation daemon logic that watches PRs, fetches diffs, runs reviewers, posts verdict comments, and moves PR state toward merge or feedback.

Canonical split for this phase:

  • KB repo: decision-engine
  • implementation/runtime repo: teleo-infrastructure
  • production runtime: VPS under /opt/teleo-eval, not currently accessible from this workspace

What This Branch Changes

Local code changes:

  • lib/agent_routing.py: new pure router that maps a PR diff to one or two Hermes agents.
  • lib/config.py: adds PHASE1B_AGENT_ROUTING_ENABLED, default false.
  • lib/evaluate.py: adds a feature-flagged Phase 1b eval path.
  • lib/llm.py: adds run_agent_review.
  • tests/test_agent_routing.py: router tests.
  • tests/test_evaluate_agent_routing.py: mocked eval tests.
  • tests/test_eval_parse.py: all six VERDICT:AGENT:* parser coverage.

Spec/docs changes:

  • docs/phase1b-agent-routing-spec.md
  • docs/phase1b/README.md
  • child specs under docs/phase1b/
  • docs/phase1b/staging-blocker.json

What It Does Not Change

  • It does not enable Phase 1b in production.
  • It does not touch the VPS.
  • It does not create or require six GitHub identities.
  • It does not solve the Forgejo-vs-GitHub cutover.
  • It does not fix unrelated full-suite failures.

Current Safety Posture

The feature flag defaults off:

PHASE1B_AGENT_ROUTING_ENABLED=false

With the flag off, the legacy eval path remains available. The Phase 1b path should only run in staging or a controlled daemon after explicit env config.

The local review hardening pass removed changes to lib/domains.py so the legacy domain map is not changed by this branch.

Local Proof

Focused proof that currently passes:

.venv/bin/python -m pytest tests/test_agent_routing.py tests/test_evaluate_agent_routing.py tests/test_eval_parse.py
.venv/bin/ruff check lib/agent_routing.py lib/domains.py lib/evaluate.py lib/llm.py lib/config.py tests/test_agent_routing.py tests/test_evaluate_agent_routing.py
git diff --check

Latest focused result:

61 passed
ruff: all checks passed
git diff --check: passed

Full-suite status:

406 passed, 12 failed, 3 errors

Known full-suite failure groups:

  • db.migrate fresh-fixture rebuild error: prs_new has no column named auto_merge
  • contributor test fixture missing submitted_by
  • date/frontmatter expectations in test_post_extract.py
  • search threshold expectation in test_search.py
  • missing python-telegram-bot imports for X content tests

Those failures mean this branch should not be called repo-green or PR-ready.

How To Review Locally

Stay local:

git switch phase1b-agent-routing-local
git status --short --branch
git diff main...HEAD --stat
git diff main...HEAD -- lib/agent_routing.py lib/evaluate.py lib/llm.py lib/config.py

Review the behavior in this order:

  1. lib/agent_routing.py
  2. tests/test_agent_routing.py
  3. lib/evaluate.py
  4. tests/test_evaluate_agent_routing.py
  5. docs/phase1b/staging-blocker.json

Before Any PR

Do not open a PR until at least one of these is true:

  • full-suite failures are triaged into accepted unrelated failures with issue links, or fixed;
  • staging access is available and a sandbox proof path is ready;
  • m3taversal/Fwaz explicitly accept a local-only draft review without staging proof.

Before Production

Production requires:

  • staging proof against sandbox decision-engine;
  • exact reviewed SHA;
  • Leo signoff;
  • no direct VPS self-upgrades;
  • PHASE1B_AGENT_ROUTING_ENABLED enabled only after cutover plan is written;
  • rollback path to flag-off behavior.