Followup to f97dd15. Four fixes from review:
MUST-FIX #1 — Forgejo double-PATCH drift
reaper closes PR via forgejo_api PATCH at line 689, then close_pr() at
line 700 issued a second PATCH (default close_on_forgejo=True). On
transient failure of the second PATCH, close_pr returns False without
updating the DB → status='open' even though Forgejo is closed. Pass
close_on_forgejo=False so DB close is unconditional after the explicit
Forgejo PATCH succeeds.
MUST-FIX #2 — reaper exception trips fix breaker
Unhandled exception in verdict_deadlock_reaper_cycle propagated to
stage_loop, recording fix-stage failures. After 5 reaper failures the
fix breaker would open and block mechanical+substantive for 15 min.
Wrap reaper call in try/except in fix_cycle (same exception-isolation
pattern as ingest_cycle's extract_cycle wrapper). Defense-in-depth
must never block primary paths.
WARNING #1 — throttle SQL full-scan
audit_log only has idx_audit_stage. Filtering on event alone caused
full-table scans every 60s. Added stage='reaper' so the planner uses
the existing index — reaper writes audit rows under stage='reaper'
already so the filter is correct.
WARNING #2 — REAPER_DRY_RUN as code constant
Flipping dry-run → live required edit + commit + push + deploy +
restart. Moved REAPER_DRY_RUN, REAPER_DEADLOCK_AGE_HOURS,
REAPER_INTERVAL_SECONDS, REAPER_MAX_PER_RUN to lib/config.py with
os.environ.get() overrides. Operator now flips via systemctl edit
teleo-pipeline.service (Environment=REAPER_DRY_RUN=false) + restart.
Defaults remain safe: dry-run, 24h age, hourly throttle, 50/run cap.
NIT — dry-run counter naming
Renamed local `closed` counter in dry-run path to `would_close` so the
heartbeat audit ("X closed, Y would-close") and journal log are
unambiguous. Function still returns closed + would_close so callers
see total work done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>