Commit graph

2 commits

Author SHA1 Message Date
540ba97b9d fix(attribution): Phase A followup — bug #1 + 4 nits + refactor (Ganymede review)
Some checks are pending
CI / lint-and-test (push) Waiting to run
Addresses Apr 24 review of 58fa8c52. All 6 findings landed.

Bug #1 — git log -1 returns latest commit, not first (semantic mismatch
with "original author" comment):
  Drop -1 flag, take last line of default-ordered log output (= oldest).
  Fixes mis-credit on multi-commit PRs where a reviewer rebased/force-pushed.

Nit #2 — forward writer didn't pass merged_at:
  Fetch merged_at in the prs SELECT, thread pr_merged_at through all 5
  insert_contribution_event call sites. Keeps forward-emitted and backfilled
  event timestamps on the same timeline after merge retries.

Nit #3 — legacy-counts fallback paths emit no events (parity gap):
  git-author and prs.agent fallback paths now emit challenger/synthesizer
  events via the TRAILER_EVENT_ROLE map when refined_type matches. Closes
  the gap where external-contributor challenge/enrich PRs would accumulate
  legacy counts but disappear from event-sourced leaderboards.

Nit #4 — migration v24 agent seed missing 'pipeline':
  Added "pipeline" to the seed list. Plus new migration v25 with idempotent
  corrective UPDATE so existing envs (where v24 already ran) pick up the
  fix on restart without requiring manual SQL. Verified on VPS state:
  pipeline row was kind='person', will flip to 'agent' on redeploy.

Nit #5 — backfill summary prints originator attempted=0 in wrong pass:
  Split the "=== Summary ===" header into "=== PR-level events ===" and
  "=== Claim-level originator pass ===" with originator counts in the
  right block. Operator-facing cosmetic.

Refactor #6 — AGENT_BRANCH_PREFIXES duplicated in 2 sites:
  Extracted to lib/attribution.py as single source of truth. contributor.py
  imports it. backfill-events.py keeps its local copy (runs standalone
  without pipeline package import) with a sync-reference comment.

No behavioral drift for the common case. Backfill re-runs cleanly against
existing forward-written events (UNIQUE-index idempotency).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:13:54 +01:00
58fa8c5276 feat(attribution): Phase A — event-sourced contribution ledger (schema v24)
Some checks are pending
CI / lint-and-test (push) Waiting to run
Introduces contribution_events table + non-breaking double-write. Schema
lands today, forward traffic writes events alongside existing count upserts,
backfill script replays history. Phase B will add leaderboard API reading
from events; Phase C switches Argus dashboard over.

## Schema v24 (lib/db.py)

- contribution_events: one row per credit-earning event
  (id, handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
  Partial UNIQUE indexes handle SQLite's NULL != NULL semantics:
    idx_ce_unique_claim on (handle, role, pr_number, claim_path) WHERE claim_path NOT NULL
    idx_ce_unique_pr    on (handle, role, pr_number)             WHERE claim_path IS NULL
  PR-level events (evaluator, author, challenger, synthesizer) dedup on 3-tuple.
  Per-claim events (originator) dedup on 4-tuple. Idempotent on replay.
- contributor_aliases: canonical handle mapping
  Seeded: @thesensatore → thesensatore, cameron → cameron-s1
- contributors.kind TEXT DEFAULT 'person'
  Migration seeds 'agent' for known Pentagon agent handles.

## Role model (confirmed by Cory Apr 24)

Weights: author 0.30, challenger 0.25, synthesizer 0.20, originator 0.15, evaluator 0.05
- author:     human who submitted the PR (curation + submission work)
- originator: person who authored the underlying content (rewards external creators)
- challenger: agent/person who brought a productive disagreement
- synthesizer: cross-domain work (enrichments, research sessions)
- evaluator:  reviewer who approved (Leo + domain agent)

Humans-are-always-author: agents credit is capped at evaluator/synthesizer/
challenger. Pentagon agents classify as kind='agent' and surface in the
agent-view leaderboard, not the default person view.

## Writer (lib/contributor.py)

- New insert_contribution_event(): idempotent INSERT OR IGNORE with alias
  normalization + kind classification. Falls back silently on pre-v24 DBs.
- record_contributor_attribution double-writes alongside existing
  upsert_contributor calls. Zero risk to current dashboard.
- Author event: emitted once per PR from prs.submitted_by → git author →
  agent-branch-prefix.
- Originator events: emitted per claim from frontmatter sourcer, skipping
  when sourcer == author (avoids self-credit double-count).
- Evaluator events: Leo (always when leo_verdict='approve') + domain_agent
  (when domain_verdict='approve' and not Leo).
- Challenger/Synthesizer: emitted from Pentagon-Agent trailer on
  agent-owned branches (theseus/*, rio/*, etc.) based on commit_type.
  Pipeline-owned branches (extract/*, reweave/*) get no trailer-based event —
  infrastructure work isn't contribution credit.

## Helpers (lib/attribution.py)

- normalize_handle(raw, conn=None): lowercase + strip @ + alias lookup
- classify_kind(handle): returns 'agent' for PENTAGON_AGENTS, else 'person'
  Intentionally narrow. Orgs get classified by operator review, not heuristics.

## Backfill (scripts/backfill-events.py)

Replays all merged PRs into events. Idempotent (safe to re-run). Emits:
- PR-level: author, evaluator, challenger, synthesizer
- Per-claim: originator (walks knowledge tree, matches via description titles)

Known limitation: post-merge PR branches are deleted from Forgejo, so we
can't diff them for granular per-claim events. Claim→PR mapping uses
prs.description (pipe-separated titles). Misses some edge cases but
recovers the bulk of historical originator credit. Forward traffic gets
clean per-claim events via the normal record_contributor_attribution path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 13:59:22 +01:00