From cdb0b1498d847ba9694049075d79c95f58961f20 Mon Sep 17 00:00:00 2001 From: twentyOne2x Date: Fri, 29 May 2026 14:17:28 +0200 Subject: [PATCH] Add phase 1b local review guide --- docs/phase1b/local-review-guide.md | 125 +++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 docs/phase1b/local-review-guide.md diff --git a/docs/phase1b/local-review-guide.md b/docs/phase1b/local-review-guide.md new file mode 100644 index 0000000..ec4e577 --- /dev/null +++ b/docs/phase1b/local-review-guide.md @@ -0,0 +1,125 @@ +# Phase 1b Local Review Guide + +Status: local-only review artifact +Branch: `phase1b-agent-routing-local` + +## What This Repo Is + +`teleo-infrastructure` is the pipeline/runtime repo. For Phase 1b, it owns the evaluation daemon logic that watches PRs, fetches diffs, runs reviewers, posts verdict comments, and moves PR state toward merge or feedback. + +Canonical split for this phase: + +- KB repo: `decision-engine` +- implementation/runtime repo: `teleo-infrastructure` +- production runtime: VPS under `/opt/teleo-eval`, not currently accessible from this workspace + +## What This Branch Changes + +Local code changes: + +- `lib/agent_routing.py`: new pure router that maps a PR diff to one or two Hermes agents. +- `lib/config.py`: adds `PHASE1B_AGENT_ROUTING_ENABLED`, default `false`. +- `lib/evaluate.py`: adds a feature-flagged Phase 1b eval path. +- `lib/llm.py`: adds `run_agent_review`. +- `tests/test_agent_routing.py`: router tests. +- `tests/test_evaluate_agent_routing.py`: mocked eval tests. +- `tests/test_eval_parse.py`: all six `VERDICT:AGENT:*` parser coverage. + +Spec/docs changes: + +- `docs/phase1b-agent-routing-spec.md` +- `docs/phase1b/README.md` +- child specs under `docs/phase1b/` +- `docs/phase1b/staging-blocker.json` + +## What It Does Not Change + +- It does not enable Phase 1b in production. +- It does not touch the VPS. +- It does not create or require six GitHub identities. +- It does not solve the Forgejo-vs-GitHub cutover. +- It does not fix unrelated full-suite failures. + +## Current Safety Posture + +The feature flag defaults off: + +```text +PHASE1B_AGENT_ROUTING_ENABLED=false +``` + +With the flag off, the legacy eval path remains available. The Phase 1b path should only run in staging or a controlled daemon after explicit env config. + +The local review hardening pass removed changes to `lib/domains.py` so the legacy domain map is not changed by this branch. + +## Local Proof + +Focused proof that currently passes: + +```bash +.venv/bin/python -m pytest tests/test_agent_routing.py tests/test_evaluate_agent_routing.py tests/test_eval_parse.py +.venv/bin/ruff check lib/agent_routing.py lib/domains.py lib/evaluate.py lib/llm.py lib/config.py tests/test_agent_routing.py tests/test_evaluate_agent_routing.py +git diff --check +``` + +Latest focused result: + +```text +61 passed +ruff: all checks passed +git diff --check: passed +``` + +Full-suite status: + +```text +406 passed, 12 failed, 3 errors +``` + +Known full-suite failure groups: + +- `db.migrate` fresh-fixture rebuild error: `prs_new has no column named auto_merge` +- contributor test fixture missing `submitted_by` +- date/frontmatter expectations in `test_post_extract.py` +- search threshold expectation in `test_search.py` +- missing `python-telegram-bot` imports for X content tests + +Those failures mean this branch should not be called repo-green or PR-ready. + +## How To Review Locally + +Stay local: + +```bash +git switch phase1b-agent-routing-local +git status --short --branch +git diff main...HEAD --stat +git diff main...HEAD -- lib/agent_routing.py lib/evaluate.py lib/llm.py lib/config.py +``` + +Review the behavior in this order: + +1. `lib/agent_routing.py` +2. `tests/test_agent_routing.py` +3. `lib/evaluate.py` +4. `tests/test_evaluate_agent_routing.py` +5. `docs/phase1b/staging-blocker.json` + +## Before Any PR + +Do not open a PR until at least one of these is true: + +- full-suite failures are triaged into accepted unrelated failures with issue links, or fixed; +- staging access is available and a sandbox proof path is ready; +- m3taversal/Fwaz explicitly accept a local-only draft review without staging proof. + +## Before Production + +Production requires: + +- staging proof against sandbox `decision-engine`; +- exact reviewed SHA; +- Leo signoff; +- no direct VPS self-upgrades; +- `PHASE1B_AGENT_ROUTING_ENABLED` enabled only after cutover plan is written; +- rollback path to flag-off behavior.