fix(attribution): credit research-session sources to agents, not m3taversal #7
Loading…
Reference in a new issue
No description provided.
Delete branch "ship/research-attribution-fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Two-part fix. Forward fix in research-session.sh frontmatter template; backfill script for last 30 days already applied to production (304 PRs).
Pre-backfill DB snapshot:
pipeline.db.bak-pre-research-attributionon VPS.Two-part fix for a bug where every claim extracted from agent overnight research sessions was being credited to m3taversal in contribution_events (visible in the activity feed as "@m3taversal" on agent-derived claims). Forward fix (research/research-session.sh): The frontmatter template the agent prompt instructs Claude to use now includes `proposed_by: ${AGENT}` and `intake_tier: research-task`. With those fields present, extract.py path 1 (line 687) takes precedence and sets prs.submitted_by to the agent handle, which then propagates into contribution_events as a kind='agent' author event for the agent. Without the fields, extract.py fell through to the default branch on line 695 and set submitted_by='@m3taversal'. Backfill (scripts/backfill-research-session-attribution.py): Identifies research-session-derived PRs by finding teleo-codex commits matching `^<agent>: research session YYYY-MM-DD —`, listing the inbox/queue/*.md files added in each commit's diff, and matching those filename basenames against prs.source_path. Only PRs currently submitted_by='@m3taversal' AND merged within the configurable window are touched. Default --dry-run; --apply to commit. For each match the script: 1. UPDATE prs SET submitted_by = '<agent> (self-directed)' 2. INSERT OR IGNORE the agent author event (kind='agent', weight=0.30) with the original PR's domain, channel, merged_at preserved 3. DELETE the misattributed m3taversal author event Applied 30-day backfill on VPS: - 304 PRs re-attributed (rio 74, clay 70, astra 53, vida 48, theseus 30, leo 29) - 297 m3taversal author events deleted, 304 agent author events inserted (delta of 7 = pre-v24 PRs that never had m3ta events in the first place; we still create the new agent event) - m3taversal author count: 1368 → 1071 (−22%) - Pre-backfill DB snapshot: pipeline.db.bak-pre-research-attribution Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Resolves the format inconsistency between the forward fix and the 304-row backfill. Both halves now produce prs.submitted_by = "rio (self-directed)": - research-session.sh: drop proposed_by from the frontmatter template. extract.py path 1 (proposed_by-driven) no longer fires; path 2 fires instead and constructs f"{agent} (self-directed)" — matches backfill. - attribution.py: normalize_handle now strips "(self-directed)" suffix immediately after lowercase+@-strip, before alias lookup. Closes the phantom-person-event class on any future replay through record_contributor_attribution. Round-trips through alias rules keyed on bare agent names. Test (5 cases) still passes; suffix-strip behavior verified against hostile inputs (whitespace, casing, mid-string occurrences must NOT match — only trailing pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>