Defense-in-depth for PRs that substantive_fixer can't make progress on.
Targets two stuck-verdict shapes empirically observed in production:
1. leo:request_changes + domain:approve
Leo asked for substantive fix; fixer either failed silently
(no_claim_files / no_review_comments / etc.) or the issue tag isn't
in FIXABLE | CONVERTIBLE | UNFIXABLE.
2. leo:skipped + domain:request_changes
Eval bypassed Leo (eval_attempts >= MAX). Domain rejected with no
structured eval_issues. fixer can't classify the issue.
92 PRs match this gate today, oldest at 2026-04-24 (13d stuck).
Behavior:
- Hourly throttle via audit_log sentinel ('verdict_deadlock_reaper_run').
- REAPER_DRY_RUN=True default — first deploy emits 'would_close' audit
events only. No DB writes. No Forgejo writes. (Ship Apr 24 directive.)
- 24h cooldown, oldest-first, capped at 50 per run.
- Heartbeat audit fires whether dry-run or live, so throttle works.
- Live mode: posts comment + closes Forgejo PR + close_pr() in DB.
Audits 'verdict_deadlock_closed' per PR.
- Forgejo PATCH None → skip DB close (avoid drift).
Wired into fix_cycle() in teleo-pipeline.py. Runs after mechanical
and substantive fixes, never blocks them.
Followup (post first-run audit verification):
- Operator inspects 'verdict_deadlock_would_close' audit rows
- Flips REAPER_DRY_RUN to False, redeploys
- Reaper actually closes on next hourly tick
A transient DB lock in breaker.record_failure() inside an except handler
killed the asyncio coroutine permanently — snapshot_cycle died Apr 18 and
never recovered. All three breaker call sites now have their own try/except.
Also includes HTML injection fix for github_feedback review_text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix#12: domain_review undefined on resume path — initialize to None,
guard _parse_issues() call. Prevents NameError on PRs resuming after
partial eval (76 PRs in this state right now).
- Fix#11: concurrent eval workers can duplicate reviews — add atomic
UPDATE SET status='reviewing' WHERE status='open' at top of
evaluate_pr(). Check rowcount, skip if already claimed.
- Fix#8: subprocess tracking for graceful shutdown — _active_subprocesses
set in evaluate module, tracked in _claude_cli_call, exposed via
kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py.
- Fix health.py divide-by-zero — guard all metabolic metric reads against
None from NULLIF/empty result set. Prevents TypeError on /health when
no PRs have been evaluated in 24h.
Also includes Leo's existing hot-fixes:
- Rate limit detection checks stdout regardless of exit code
- 15-minute cycle-level backoff on rate limit
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>