fix(substantive_fixer): json_valid guard in front of json_each

Ganymede review of 5db6a02 (msg 2 of 3): json_each(invalid_json) throws 'malformed JSON' and propagates up through EXISTS, failing the SELECT. The fix-cycle call site at teleo-pipeline.py:104 isn't try/except wrapped (the reaper at line 109-116 is, the substantive cycle isn't), so a single corrupt eval_issues row would trip the fix-stage breaker after 5 occurrences. Fix is one line — AND json_valid(eval_issues) before the EXISTS clause. json_valid(NULL) returns NULL (false in WHERE), json_valid(invalid) returns 0, json_valid(valid) returns 1. SQLite 3.9+, predates VPS 3.45.1. WARN-on-corrupt-JSON path kept per Ganymede's Q3 — json_valid and json.loads use technically distinct parsers, cost is ~3 rows × parse-empty-string per cycle, journal entry names the failure mode if SQLite ever surfaces a row that passes both SQL guards but fails json.loads. Comment updated to reflect new guard ordering.
fix(substantive_fixer): SQL-side actionable-tag filter, eliminate head-of-line
2026-05-08 13:12:25 -04:00 · 2026-05-08 12:52:12 -04:00
1 changed files with 25 additions and 23 deletions
--- a/lib/substantive_fixer.py
+++ b/lib/substantive_fixer.py
@ -522,53 +522,55 @@ async def substantive_fix_cycle(conn, max_workers=None) -> tuple[int, int]:
    Finds PRs with substantive issue tags that haven't exceeded fix budget.
    Processes up to 3 per cycle (Rhea: 180s interval, don't overwhelm eval).
    """
+    # Build the actionable-tag list from the routing constants so adding a new
+    # tag to FIXABLE_TAGS / CONVERTIBLE_TAGS / UNFIXABLE_TAGS auto-updates the
+    # SELECT filter — no two-place edit footgun.
+    actionable_tags = sorted(FIXABLE_TAGS | CONVERTIBLE_TAGS | UNFIXABLE_TAGS)
+    placeholders = ",".join(["?"] * len(actionable_tags))
+
+    # Push the actionable-tag filter into SQL (was a post-fetch Python loop).
+    # The old shape selected the 3 oldest request_changes PRs and then dropped
+    # ones without actionable tags, so empty-eval_issues rows occupied LIMIT-3
+    # forever (head-of-line). Now LIMIT-3 always returns 3 actionable rows.
+    # Reaper handles the empty-tag PRs after their 24h cooldown.
    rows = conn.execute(
-        """SELECT number, eval_issues FROM prs
+        f"""SELECT number, eval_issues FROM prs
           WHERE status = 'open'
           AND tier0_pass = 1
           AND (domain_verdict = 'request_changes' OR leo_verdict = 'request_changes')
           AND COALESCE(fix_attempts, 0) < ?
           AND (last_attempt IS NULL OR last_attempt < datetime('now', '-3 minutes'))
+           AND json_valid(eval_issues)
+           AND EXISTS (
+               SELECT 1 FROM json_each(eval_issues)
+               WHERE value IN ({placeholders})
+           )
           ORDER BY created_at ASC
           LIMIT 3""",
-        (MAX_SUBSTANTIVE_FIXES + config.MAX_FIX_ATTEMPTS,),  # Total budget: mechanical + substantive
+        (MAX_SUBSTANTIVE_FIXES + config.MAX_FIX_ATTEMPTS, *actionable_tags),
    ).fetchall()

    if not rows:
        return 0, 0

-    # Filter to only PRs with substantive issues (not just mechanical)
+    # Defense-in-depth: json_valid(eval_issues) in the SELECT already filters
+    # corrupt JSON before json_each runs, so this WARN should be unreachable.
+    # Kept anyway: json_valid and json.loads use technically distinct parsers,
+    # and the journal entry names the failure mode if SQLite ever surfaces a
+    # row that passes json_valid + json_each but fails json.loads.
    substantive_rows = []
-    skipped_no_tags = []
    for row in rows:
        try:
-            issues = json.loads(row["eval_issues"] or "[]")
+            json.loads(row["eval_issues"] or "[]")
        except (json.JSONDecodeError, TypeError):
-            # Corrupt JSON in eval_issues is abnormal (post-merge column drift,
-            # hand-edited row, partial write during crash). WARN so ops can chase
-            # the upstream column-write path. Without this, the row drops out of
-            # both substantive_rows and skipped_no_tags — the third silent path.
            logger.warning(
                "PR #%d: corrupt eval_issues JSON — skipping in substantive fix cycle",
                row["number"],
            )
            continue
-        if set(issues) & (FIXABLE_TAGS | CONVERTIBLE_TAGS | UNFIXABLE_TAGS):
-            substantive_rows.append(row)
-        else:
-            skipped_no_tags.append((row["number"], issues))
+        substantive_rows.append(row)

    if not substantive_rows:
-        # Visibility for the LIMIT-3 head-of-line block: if the oldest
-        # candidates have no fixer-actionable tags (e.g. eval_issues=[],
-        # broken_wiki_links only), the cycle silently returns 0 — and the
-        # next cycle picks the same head-of-line, forever. Log the eval_issues
-        # of skipped candidates so the journal makes the block visible.
-        if skipped_no_tags:
-            logger.info(
-                "Substantive fix cycle: 0 actionable from %d candidate(s) — head-of-line: %s",
-                len(rows), skipped_no_tags,
-            )
        return 0, 0

    fixed = 0