teleo-infrastructure/lib
m3taversal 3fe0f4b744 fix(attribution): credit sourcer/extractor from claim frontmatter
Three layers of contributor-attribution bug surfaced by Apr 24 leaderboard
investigation. alexastrum, thesensatore, cameron-s1 all had real merged
contributions but zero credit in the contributors table.

1. lib/attribution.py: parse_attribution() only read `attribution_sourcer:`
   prefix-keyed flat fields. ~42% of claim files (535/1280) use the bare-key
   form `sourcer: alexastrum` written by extract.py. Added bare-key handling
   between the prefixed-flat path and the legacy-source-field fallback.
   Block format (`attribution: { sourcer: [...] }`) still wins when present.

2. lib/contributor.py: record_contributor_attribution() parsed the diff text
   with regex looking for `+- handle: "X"` lines. This matched neither the
   bare-key flat format nor the `attribution: { sourcer: [...] }` block
   format Leo uses for manual extractions. Replaced the regex parser with
   a file walker that calls attribution.parse_attribution_from_file() on
   each changed knowledge file — single source of truth for both formats.

3. scripts/backfill-sourcer-attribution.py: walks all merged knowledge files,
   re-attributes via the canonical parser, upserts contributors. Default
   additive mode preserves existing high counts (e.g. m3taversal.sourcer=1011
   reflects Telegram-curator credit accumulated via a different code path
   that this fix does not touch). --reset flag for the destructive case.

Dry-run preview (additive mode):
  - 670 NEW contributors to insert (mostly source-citation handles)
  - 77 EXISTING contributors with under-counted role columns
  - alexastrum: 0 → 6, thesensatore: 0 → 5, cameron-s1: 0 → 2
  - astra.sourcer: 0 → 96, leo.sourcer: 0 → 44, theseus.sourcer: 0 → 18
  - m3taversal.sourcer: 1011 (preserved, not 22 from file walk)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 12:48:41 +01:00
..
__init__.py Initial commit: Pipeline v2 daemon + infrastructure docs 2026-03-12 14:11:18 +00:00
analytics.py epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features 2026-03-20 20:17:27 +00:00
attribution.py fix(attribution): credit sourcer/extractor from claim frontmatter 2026-04-24 12:48:41 +01:00
breaker.py ganymede: add dev infrastructure — pyproject.toml, CI, deploy script 2026-03-13 14:24:27 +00:00
cascade.py fix: sync all code from VPS — repo is now authoritative source of truth 2026-04-15 13:18:01 +01:00
claim_index.py epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features 2026-03-20 20:17:27 +00:00
config.py fix: close cooldown-dependence gaps in extract.py (Ganymede review) 2026-04-22 11:33:10 +01:00
connect.py fix: sync all code from VPS — repo is now authoritative source of truth 2026-04-15 13:18:01 +01:00
contributor.py fix(attribution): credit sourcer/extractor from claim frontmatter 2026-04-24 12:48:41 +01:00
costs.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
cross_domain.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
db.py feat(activity): Timeline data gaps — type filter + commit_type classifier + source_channel reshape 2026-04-23 19:51:58 +01:00
dedup.py fix: enrichment idempotency — three-layer dedup prevents duplicate evidence blocks 2026-03-31 13:18:23 +01:00
digest.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
domains.py Wire rejection_reason into review records + fix ingestion domain routing 2026-04-20 18:03:34 +01:00
entity_batch.py fix: enrichment idempotency — three-layer dedup prevents duplicate evidence blocks 2026-03-31 13:18:23 +01:00
entity_queue.py epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features 2026-03-20 20:17:27 +00:00
eval_actions.py fix(eval): treat empty diff as conservative fallback in auto-close gate 2026-04-23 11:24:16 +01:00
eval_parse.py refactor: Phase 4 — extract eval_actions.py, drop underscore prefixes in eval_parse 2026-04-16 12:57:51 +01:00
evaluate.py Wire rejection_reason into review records + fix ingestion domain routing 2026-04-20 18:03:34 +01:00
extract.py fix: close cooldown-dependence gaps in extract.py (Ganymede review) 2026-04-22 11:33:10 +01:00
extraction_prompt.py Reduce near-duplicate and frontmatter schema rejections 2026-04-20 18:03:26 +01:00
feedback.py epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features 2026-03-20 20:17:27 +00:00
fixer.py refactor: Phase 2 — wire pr_state into fixer.py and substantive_fixer.py 2026-04-16 12:21:40 +01:00
forgejo.py leo: handle non-JSON 200 from Forgejo merge API 2026-03-13 17:38:00 +00:00
frontmatter.py fix: quote YAML edge values containing colons, skip unparseable files in reweave merge 2026-04-18 12:07:28 +01:00
github_feedback.py fix: wrap breaker calls in stage_loop to prevent permanent task death 2026-04-20 12:37:28 +01:00
health.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
llm.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
log.py Initial commit: Pipeline v2 daemon + infrastructure docs 2026-03-12 14:11:18 +00:00
merge.py feat: bidirectional source↔claim linking 2026-04-21 13:00:59 +01:00
post_extract.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
post_merge.py feat: bidirectional source↔claim linking 2026-04-21 13:00:59 +01:00
pr_state.py refactor: Phase 3 — fix close_pr ghost bug, wire stale_pr, extract eval_parse 2026-04-16 12:40:23 +01:00
pre_screen.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
search.py Consolidate pipeline code from teleo-codex + VPS into single repo 2026-04-07 16:52:26 +01:00
stale_pr.py refactor: Phase 3 — fix close_pr ghost bug, wire stale_pr, extract eval_parse 2026-04-16 12:40:23 +01:00
substantive_fixer.py refactor: Phase 2 — wire pr_state into fixer.py and substantive_fixer.py 2026-04-16 12:21:40 +01:00
validate.py Reduce near-duplicate and frontmatter schema rejections 2026-04-20 18:03:26 +01:00
watchdog.py fix: sync all code from VPS — repo is now authoritative source of truth 2026-04-15 13:18:01 +01:00
worktree_lock.py epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features 2026-03-20 20:17:27 +00:00