Unevaluated PRs (eval_attempts=0) now sort before re-evals in the
eval cycle query. Fresh PRs have a higher chance of passing (~12%)
vs re-evals of already-rejected PRs. Prevents migration-reset PRs
from consuming eval slots that fresh PRs could use.
Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
If source_path is NULL, the source requeue silently matches nothing.
Log a warning so we catch orphaned terminations in monitoring.
Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON)
columns to prs table.
Retry budget logic (Ganymede-approved design):
- Increment eval_attempts on each evaluate_pr() call
- Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human)
- Attempt 1: normal — back to open, wait for fix
- Attempt 2: classify issues as mechanical/substantive
- Mechanical only (schema, wiki links, dedup): keep open for one more try
- Substantive (factual, confidence, scope, title): close PR, requeue source
- Issue tags parsed from reviewer comments, stored in eval_issues column
- SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset
- Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike
- Cost recording updated: domain review → OpenRouter, Leo → tier-dependent
Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state.
Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
- Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter),
Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots.
- Opus reserved for DEEP eval only — protects rate limit for overnight research.
- Review prompts calibrated: require per-criterion evidence, blocking-vs-observation
verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate.
- OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid
spurious 15-min Opus backoff.
- merge.py: pre-check PR state before merge API call (prevents 405 on re-merge).
Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
- Fix#12: domain_review undefined on resume path — initialize to None,
guard _parse_issues() call. Prevents NameError on PRs resuming after
partial eval (76 PRs in this state right now).
- Fix#11: concurrent eval workers can duplicate reviews — add atomic
UPDATE SET status='reviewing' WHERE status='open' at top of
evaluate_pr(). Check rowcount, skip if already claimed.
- Fix#8: subprocess tracking for graceful shutdown — _active_subprocesses
set in evaluate module, tracked in _claude_cli_call, exposed via
kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py.
- Fix health.py divide-by-zero — guard all metabolic metric reads against
None from NULLIF/empty result set. Prevents TypeError on /health when
no PRs have been evaluated in 24h.
Also includes Leo's existing hot-fixes:
- Rate limit detection checks stdout regardless of exit code
- 15-minute cycle-level backoff on rate limit
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>