Rewrite claim-level pass in backfill-events.py to recover the Forgejo PR
that introduced each claim via a cascade of 4 strategies (reliability
order), replacing the single title→description match that missed PRs
with NULL description (Cameron #3377) and bare-subject extracts (Shaga's
Leo research PR).
## Strategies
1. sourced_from frontmatter → prs.source_path stem match
2. git log first-add commit → subject pattern → prs.branch
- "<agent>: extract claims from <slug>" → extract/<slug>
- "<agent>: research session YYYY-MM-DD" → <agent>/research-<date>
- "<agent>: (challenge|contrib|entity|synthesize)" → <agent>/*
- "Recover X from GitHub PR #N" → prs.github_pr=N
- "Extract N claims from X" (no prefix) → time-proximity on
agent-owned branches within 24h
3. Current title_desc fallback for anything the above miss
## Dry-run projection (1,662 merged PRs)
Before:
Claims processed: 33
Originator events: 6
Breakdown: {no_pr_match: 1608, no_sourcer: 26, invalid_handle: 21, skip_self: 6}
After:
Claims processed: 505 (+472)
Originator events: 126 (+120)
Strategy hits: git_subject=412, sourced_from=88, git_time_proximity=5
Breakdown: {no_pr_match: 1095, no_sourcer: 67, invalid_handle: 359, skip_self: 20}
## Verified on real VPS data
- @thesensatore claims: 3/5 resolve via git_time_proximity to leo/ PRs
- Cameron-S1, alexastrum: remain None — their recovery commits
(dba00a79, da64f805) bypassed the pipeline entirely, no Forgejo PR
record exists. Requires synthetic prs rows — deferred to separate
commit with its own Ganymede review (write operation, larger blast
radius than this pure-read backfill change).
## Implementation
- New find_pr_for_claim(conn, repo, md) helper returns (pr_number, strategy)
- Claim-level pass uses it first, falls back to title_desc map
- Strategy counter surfaced in summary output for operator visibility
Idempotent — backfill re-runs skip duplicate events via the partial
UNIQUE index on contribution_events.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>