Apply Ganymede review nit #3 from f97dd15 review (the deferred close_on_forgejo
fix already landed in e14b5f2 — Ganymede was reviewing the older commit).
SQL gate previously had no branch filter — empirically all 92 candidates were
extract/* but structurally any agent branch in the deadlock shape was a
candidate. Positive allowlist for extract/, reweave/, fix/ scopes the reaper
to disposable pipeline-managed branches that the pipeline created and can
recreate. Agent branches (theseus/, vida/, epimetheus/, etc.) are WIP feature
work and must not be reaped — owners review their own PRs on their own cadence.
Cheap target-class lock complementing the LIMIT 50 blast-radius cap.
Same scoping principle as PIPELINE_OWNED_PREFIXES, but tighter — epimetheus/
review branches are pipeline-owned for merge purposes but NOT disposable.
Items 2-4 from this review:
- WARNING #2 (audit_log idx_audit_event_ts): defer to followup branch alongside
sync-mirror migration cleanup, as Ganymede suggested.
- NIT #3 (this commit): branch allowlist applied.
- NIT #4 (token asymmetry comment=admin/close=leo): confirmed established
codebase pattern. merge.py:946-948 does the same — comment system-toned,
close attributed to Leo for verdict-source UI clarity. Not accidental.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to f97dd15. Four fixes from review:
MUST-FIX #1 — Forgejo double-PATCH drift
reaper closes PR via forgejo_api PATCH at line 689, then close_pr() at
line 700 issued a second PATCH (default close_on_forgejo=True). On
transient failure of the second PATCH, close_pr returns False without
updating the DB → status='open' even though Forgejo is closed. Pass
close_on_forgejo=False so DB close is unconditional after the explicit
Forgejo PATCH succeeds.
MUST-FIX #2 — reaper exception trips fix breaker
Unhandled exception in verdict_deadlock_reaper_cycle propagated to
stage_loop, recording fix-stage failures. After 5 reaper failures the
fix breaker would open and block mechanical+substantive for 15 min.
Wrap reaper call in try/except in fix_cycle (same exception-isolation
pattern as ingest_cycle's extract_cycle wrapper). Defense-in-depth
must never block primary paths.
WARNING #1 — throttle SQL full-scan
audit_log only has idx_audit_stage. Filtering on event alone caused
full-table scans every 60s. Added stage='reaper' so the planner uses
the existing index — reaper writes audit rows under stage='reaper'
already so the filter is correct.
WARNING #2 — REAPER_DRY_RUN as code constant
Flipping dry-run → live required edit + commit + push + deploy +
restart. Moved REAPER_DRY_RUN, REAPER_DEADLOCK_AGE_HOURS,
REAPER_INTERVAL_SECONDS, REAPER_MAX_PER_RUN to lib/config.py with
os.environ.get() overrides. Operator now flips via systemctl edit
teleo-pipeline.service (Environment=REAPER_DRY_RUN=false) + restart.
Defaults remain safe: dry-run, 24h age, hourly throttle, 50/run cap.
NIT — dry-run counter naming
Renamed local `closed` counter in dry-run path to `would_close` so the
heartbeat audit ("X closed, Y would-close") and journal log are
unambiguous. Function still returns closed + would_close so callers
see total work done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defense-in-depth for PRs that substantive_fixer can't make progress on.
Targets two stuck-verdict shapes empirically observed in production:
1. leo:request_changes + domain:approve
Leo asked for substantive fix; fixer either failed silently
(no_claim_files / no_review_comments / etc.) or the issue tag isn't
in FIXABLE | CONVERTIBLE | UNFIXABLE.
2. leo:skipped + domain:request_changes
Eval bypassed Leo (eval_attempts >= MAX). Domain rejected with no
structured eval_issues. fixer can't classify the issue.
92 PRs match this gate today, oldest at 2026-04-24 (13d stuck).
Behavior:
- Hourly throttle via audit_log sentinel ('verdict_deadlock_reaper_run').
- REAPER_DRY_RUN=True default — first deploy emits 'would_close' audit
events only. No DB writes. No Forgejo writes. (Ship Apr 24 directive.)
- 24h cooldown, oldest-first, capped at 50 per run.
- Heartbeat audit fires whether dry-run or live, so throttle works.
- Live mode: posts comment + closes Forgejo PR + close_pr() in DB.
Audits 'verdict_deadlock_closed' per PR.
- Forgejo PATCH None → skip DB close (avoid drift).
Wired into fix_cycle() in teleo-pipeline.py. Runs after mechanical
and substantive fixes, never blocks them.
Followup (post first-run audit verification):
- Operator inspects 'verdict_deadlock_would_close' audit rows
- Flips REAPER_DRY_RUN to False, redeploys
- Reaper actually closes on next hourly tick
Third silent return path in substantive_fix_cycle — JSON-decode except
at the eval_issues parse drops rows that don't reach skipped_no_tags
or substantive_rows. If all 3 LIMIT-3 candidates have corrupt JSON,
cycle returns 0,0 with no log entry.
WARN level (not INFO): corrupt JSON is abnormal (post-merge column
drift, hand-edited DB row, partial write during crash). If this fires,
ops want to chase the upstream column-write path. If it never fires,
baseline noise stays at zero.
Closes the visibility gap on ALL silent returns in this function, not
just the two patched in 3f8666e.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two silent paths in substantive_fix_cycle masked a 13-day stall:
1. Filter strips all candidates → return 0,0 with no log. With LIMIT 3
ordered created_at ASC, if the oldest 3 have no fixer-actionable tags
(e.g. eval_issues=[] from leo:skipped+domain:request_changes), the
cycle silently picks the same head-of-line every tick.
2. _fix_pr early-returns logged at DEBUG only — invisible without
fleet-wide DEBUG. Skip reasons (no_claim_files, no_review_comments,
not_open lock, worktree_failed, etc.) never surfaced in journalctl.
Patch: log skipped candidate eval_issues when no actionable rows
found (path 1); promote DEBUG→INFO for per-PR skip reasons (path 2).
Zero behavior change — observability only.
Diagnosis context: 98 PRs stuck >3d, last successful substantive_fixer
event 2026-04-24. Need journal evidence to choose between (a) one-line
fix to the cycle, (b) larger _fix_pr regression. (Ship Step 2 directive.)
Fix 4 Forgejo ghost PR bugs flagged by Ganymede:
- fixer.py GC close: DB update ran outside try/except, closing DB even on Forgejo failure
- substantive_fixer.py droppable: NO Forgejo close at all
- substantive_fixer.py auto-enrichment: DB update before Forgejo (reversed order)
- substantive_fixer.py close_and_reextract: replace manual Forgejo+DB with close_pr()
Add start_fixing() and reset_for_reeval() to pr_state.py:
- start_fixing: atomic claim + fix_attempts increment in one statement
- reset_for_reeval: clears all eval state for re-evaluation after fix
Also fixes stale line number comment in merge.py (Ganymede nit).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before
re-extracting. Sources with >=3 closed PRs are skipped (zombie protection).
Cost tracking: openrouter_call() now returns (text, usage) tuple with
prompt_tokens and completion_tokens from the OpenRouter API response.
All callers updated to unpack and pass tokens to costs.record_usage().
Added missing triage cost recording. Fixed batch domain review recording
cost once per batch instead of once per PR.
Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>