fix(attribution): classify submitted_by by branch prefix at PR discovery #11

Merged
fwazb merged 1 commit from fix/reattribute-by-branch-prefix into main 2026-05-13 03:57:05 +00:00
Owner

reweave.py and ingestion run as the operator Forgejo token, so the prior
opener-based classifier set submitted_by=m3taversal for every system
maintenance PR. backfill_submitted_by.py never overrides non-NULL rows,
so this misattribution accumulated: ~2,748 reweave/ingestion PRs and
~3,706 / research/entity PRs were credited to the operator on
the leaderboard and contribution_events table.

Two parts:

  1. lib/merge.py: at PR discovery, classify by branch prefix first.
    reweave/, ingestion/ -> submitted_by = 'pipeline'
    / (per _AGENT_NAMES) -> submitted_by = ''
    otherwise human -> submitted_by = author.lower()
    otherwise pipeline -> submitted_by = None
    (extract.py sets from proposed_by)
    Origin flag updated so domain detection and priority still fire for
    branch-classified pipeline PRs. Human PRs lowercased to maintain the
    canonical-handle contract enforced in PR #9.

  2. scripts/reattribute-by-branch-prefix.py: historical cleanup.
    Per affected PR (atomic):

    • UPDATE prs.submitted_by -> target
    • UPDATE sources.submitted_by where source_path matches
    • UPDATE contribution_events handle ('m3taversal',role='author')
      -> target, kind='agent'. Collision (target already has author
      event for PR) deletes the m3ta row; target wins.

    Scope is deliberately conservative: extract/ branches stay attributed
    to m3taversal because proposed_by-missing legitimately defaults to the
    operator (telegram drops). Only reweave/, ingestion/, and /.

    Dry-run shows 6,454 PRs + 284 events to move. Pre-flight collision
    query returns 0; pre-flight kind check confirms m3ta has only role=author
    events on this set (no challenger/synthesizer/evaluator).

    Idempotent. Dry-run by default. Run with --apply after deploy + DB
    snapshot.

reweave.py and ingestion run as the operator Forgejo token, so the prior opener-based classifier set submitted_by=m3taversal for every system maintenance PR. backfill_submitted_by.py never overrides non-NULL rows, so this misattribution accumulated: ~2,748 reweave/ingestion PRs and ~3,706 <agent>/ research/entity PRs were credited to the operator on the leaderboard and contribution_events table. Two parts: 1. lib/merge.py: at PR discovery, classify by branch prefix first. reweave/, ingestion/ -> submitted_by = 'pipeline' <agent>/ (per _AGENT_NAMES) -> submitted_by = '<agent>' otherwise human -> submitted_by = author.lower() otherwise pipeline -> submitted_by = None (extract.py sets from proposed_by) Origin flag updated so domain detection and priority still fire for branch-classified pipeline PRs. Human PRs lowercased to maintain the canonical-handle contract enforced in PR #9. 2. scripts/reattribute-by-branch-prefix.py: historical cleanup. Per affected PR (atomic): - UPDATE prs.submitted_by -> target - UPDATE sources.submitted_by where source_path matches - UPDATE contribution_events handle ('m3taversal',role='author') -> target, kind='agent'. Collision (target already has author event for PR) deletes the m3ta row; target wins. Scope is deliberately conservative: extract/ branches stay attributed to m3taversal because proposed_by-missing legitimately defaults to the operator (telegram drops). Only reweave/, ingestion/, and <agent>/. Dry-run shows 6,454 PRs + 284 events to move. Pre-flight collision query returns 0; pre-flight kind check confirms m3ta has only role=author events on this set (no challenger/synthesizer/evaluator). Idempotent. Dry-run by default. Run with --apply after deploy + DB snapshot.
fwazb added 1 commit 2026-05-13 03:56:58 +00:00
fix(attribution): classify submitted_by by branch prefix at PR discovery
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
c9515c770a
reweave.py and ingestion run as the operator Forgejo token, so the prior
opener-based classifier set submitted_by=m3taversal for every system
maintenance PR. backfill_submitted_by.py never overrides non-NULL rows,
so this misattribution accumulated: ~2,748 reweave/ingestion PRs and
~3,706 <agent>/ research/entity PRs were credited to the operator on
the leaderboard and contribution_events table.

Two parts:

1. lib/merge.py: at PR discovery, classify by branch prefix first.
     reweave/, ingestion/             -> submitted_by = 'pipeline'
     <agent>/ (per _AGENT_NAMES)      -> submitted_by = '<agent>'
     otherwise human                  -> submitted_by = author.lower()
     otherwise pipeline               -> submitted_by = None
                                         (extract.py sets from proposed_by)
   Origin flag updated so domain detection and priority still fire for
   branch-classified pipeline PRs. Human PRs lowercased to maintain the
   canonical-handle contract enforced in PR #9.

2. scripts/reattribute-by-branch-prefix.py: historical cleanup.
   Per affected PR (atomic):
     - UPDATE prs.submitted_by  -> target
     - UPDATE sources.submitted_by where source_path matches
     - UPDATE contribution_events handle ('m3taversal',role='author')
       -> target, kind='agent'. Collision (target already has author
       event for PR) deletes the m3ta row; target wins.

   Scope is deliberately conservative: extract/ branches stay attributed
   to m3taversal because proposed_by-missing legitimately defaults to the
   operator (telegram drops). Only reweave/, ingestion/, and <agent>/.

   Dry-run shows 6,454 PRs + 284 events to move. Pre-flight collision
   query returns 0; pre-flight kind check confirms m3ta has only role=author
   events on this set (no challenger/synthesizer/evaluator).

   Idempotent. Dry-run by default. Run with --apply after deploy + DB
   snapshot.
fwazb merged commit a54f52234a into main 2026-05-13 03:57:05 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: teleo/teleo-infrastructure#11
No description provided.