Ganymede review on PR #6 (commit 800d1d8 → this commit):
- WARNING: STAT_BLEED_RE false-positives on common words. The original
pattern matched standalone stat-keyword tokens, clipping legitimate
titles like "Engage with Pantera and Active Capital" → trimmed at
" Active". Fix: require numeric/symbolic context (\$, +, -, \d) AFTER
the stat-word, so word-only sequences pass through unchanged.
- _clean_title_candidate now uses finditer + first-match-past-offset-10
instead of re.search. The DP-NNNNN digit sequence always wins leftmost
position; we want the first POST-title bleed match instead.
- NIT 3: minimum-render gate before write. Skip partial renders rather
than archiving stubs whose downstream extraction null-results.
Threshold: body < 500B AND no DP-N in title → skip and retry next run.
Verified 10/10 on test grid: real bleed trimmed, mid-word false-positives
preserved (Compass, Active Capital, Live Streaming, Encompass, Activate,
Passage, Failure all pass through unchanged).
NIT 1 (--headless no-op flag) and NIT 2 (futardio tag provenance noise):
deferred — cosmetic, batch with future touch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>