Ganymede review findings on epimetheus/contributor-attribution-fix branch:
1. BUG: record_contributor_attribution used `git diff --name-only` (all modified
files), not just added. Enrich/challenge PRs re-credited the sourcer on every
subsequent modification. Fixed: --diff-filter=A restricts to new files only.
The synthesizer/challenger/reviewer roles for enrich PRs are still credited
via the Pentagon-Agent trailer path, so this doesn't lose any correct credit.
2. WARNING: Legacy `source`-field heuristic fabricated garbage handles from
descriptive strings ("sec-interpretive-release-s7-2026-09-(march-17",
"governance---meritocratic-voting-+-futarchy"). Removed outright + added
regex handle sanity filter (`^[a-z0-9][a-z0-9_-]{0,38}$`). Applied before
every return path in parse_attribution (the nested-block early return was
previously bypassing the filter).
Dry-run impact: unique handles 83→70 (13 garbage filtered), NEW contributors
49→48, EXISTING drift rows 34→22. The filter drops rows where the literal
garbage string lives in frontmatter (Slotkin case: attribution.sourcer.handle
was written as "senator-elissa-slotkin-/-the-hill" by the buggy legacy path).
3. NIT: Aligned knowledge_prefixes in the file walker to match is_knowledge_pr
(removed entities/, convictions/). Widening those requires Cory sign-off
since is_knowledge_pr currently gates entity-only PRs out of CI.
Tests: 17 pass (added test_bad_handles_filtered, test_valid_handle_with_hyphen_passes,
updated test_legacy_source_fallback → test_legacy_source_fallback_removed).
Ganymede review — 3-message protocol msg 3 pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layers of contributor-attribution bug surfaced by Apr 24 leaderboard
investigation. alexastrum, thesensatore, cameron-s1 all had real merged
contributions but zero credit in the contributors table.
1. lib/attribution.py: parse_attribution() only read `attribution_sourcer:`
prefix-keyed flat fields. ~42% of claim files (535/1280) use the bare-key
form `sourcer: alexastrum` written by extract.py. Added bare-key handling
between the prefixed-flat path and the legacy-source-field fallback.
Block format (`attribution: { sourcer: [...] }`) still wins when present.
2. lib/contributor.py: record_contributor_attribution() parsed the diff text
with regex looking for `+- handle: "X"` lines. This matched neither the
bare-key flat format nor the `attribution: { sourcer: [...] }` block
format Leo uses for manual extractions. Replaced the regex parser with
a file walker that calls attribution.parse_attribution_from_file() on
each changed knowledge file — single source of truth for both formats.
3. scripts/backfill-sourcer-attribution.py: walks all merged knowledge files,
re-attributes via the canonical parser, upserts contributors. Default
additive mode preserves existing high counts (e.g. m3taversal.sourcer=1011
reflects Telegram-curator credit accumulated via a different code path
that this fix does not touch). --reset flag for the destructive case.
Dry-run preview (additive mode):
- 670 NEW contributors to insert (mostly source-citation handles)
- 77 EXISTING contributors with under-counted role columns
- alexastrum: 0 → 6, thesensatore: 0 → 5, cameron-s1: 0 → 2
- astra.sourcer: 0 → 96, leo.sourcer: 0 → 44, theseus.sourcer: 0 → 18
- m3taversal.sourcer: 1011 (preserved, not 22 from file walk)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Atomic extract-and-connect (lib/connect.py):
- After extraction writes claim files, each new claim is embedded via
OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55)
are added as `related` edges in the claim's frontmatter
- Edges written on NEW claim only — avoids merge conflicts
- Cross-domain connections enabled, non-fatal on Qdrant failure
- Wired into openrouter-extract-v2.py post-extraction step
Stale PR monitor (lib/stale_pr.py):
- Every watchdog cycle checks open extract/* PRs
- If open >30 min AND 0 claim files → auto-close with comment
- After 2 stale closures → marks source as extraction_failed
- Wired into watchdog.py as check #6
Response audit system:
- response_audit table (migration v8), persistent audit conn in bot.py
- 90-day retention cleanup, tool_calls JSON column
- Confidence tag stripping, systemd ReadWritePaths for pipeline.db
Supporting infrastructure:
- reweave.py: nightly edge reconnection for orphan claims
- reconcile-sources.py: source status reconciliation
- backfill-domains.py: domain classification backfill
- ops/reconcile-source-status.sh: operational reconciliation script
- Attribution improvements, post-extract enrichments, merge improvements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>