Introduces contribution_events table + non-breaking double-write. Schema
lands today, forward traffic writes events alongside existing count upserts,
backfill script replays history. Phase B will add leaderboard API reading
from events; Phase C switches Argus dashboard over.
## Schema v24 (lib/db.py)
- contribution_events: one row per credit-earning event
(id, handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
Partial UNIQUE indexes handle SQLite's NULL != NULL semantics:
idx_ce_unique_claim on (handle, role, pr_number, claim_path) WHERE claim_path NOT NULL
idx_ce_unique_pr on (handle, role, pr_number) WHERE claim_path IS NULL
PR-level events (evaluator, author, challenger, synthesizer) dedup on 3-tuple.
Per-claim events (originator) dedup on 4-tuple. Idempotent on replay.
- contributor_aliases: canonical handle mapping
Seeded: @thesensatore → thesensatore, cameron → cameron-s1
- contributors.kind TEXT DEFAULT 'person'
Migration seeds 'agent' for known Pentagon agent handles.
## Role model (confirmed by Cory Apr 24)
Weights: author 0.30, challenger 0.25, synthesizer 0.20, originator 0.15, evaluator 0.05
- author: human who submitted the PR (curation + submission work)
- originator: person who authored the underlying content (rewards external creators)
- challenger: agent/person who brought a productive disagreement
- synthesizer: cross-domain work (enrichments, research sessions)
- evaluator: reviewer who approved (Leo + domain agent)
Humans-are-always-author: agents credit is capped at evaluator/synthesizer/
challenger. Pentagon agents classify as kind='agent' and surface in the
agent-view leaderboard, not the default person view.
## Writer (lib/contributor.py)
- New insert_contribution_event(): idempotent INSERT OR IGNORE with alias
normalization + kind classification. Falls back silently on pre-v24 DBs.
- record_contributor_attribution double-writes alongside existing
upsert_contributor calls. Zero risk to current dashboard.
- Author event: emitted once per PR from prs.submitted_by → git author →
agent-branch-prefix.
- Originator events: emitted per claim from frontmatter sourcer, skipping
when sourcer == author (avoids self-credit double-count).
- Evaluator events: Leo (always when leo_verdict='approve') + domain_agent
(when domain_verdict='approve' and not Leo).
- Challenger/Synthesizer: emitted from Pentagon-Agent trailer on
agent-owned branches (theseus/*, rio/*, etc.) based on commit_type.
Pipeline-owned branches (extract/*, reweave/*) get no trailer-based event —
infrastructure work isn't contribution credit.
## Helpers (lib/attribution.py)
- normalize_handle(raw, conn=None): lowercase + strip @ + alias lookup
- classify_kind(handle): returns 'agent' for PENTAGON_AGENTS, else 'person'
Intentionally narrow. Orgs get classified by operator review, not heuristics.
## Backfill (scripts/backfill-events.py)
Replays all merged PRs into events. Idempotent (safe to re-run). Emits:
- PR-level: author, evaluator, challenger, synthesizer
- Per-claim: originator (walks knowledge tree, matches via description titles)
Known limitation: post-merge PR branches are deleted from Forgejo, so we
can't diff them for granular per-claim events. Claim→PR mapping uses
prs.description (pipe-separated titles). Misses some edge cases but
recovers the bulk of historical originator credit. Forward traffic gets
clean per-claim events via the normal record_contributor_attribution path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layers of contributor-attribution bug surfaced by Apr 24 leaderboard
investigation. alexastrum, thesensatore, cameron-s1 all had real merged
contributions but zero credit in the contributors table.
1. lib/attribution.py: parse_attribution() only read `attribution_sourcer:`
prefix-keyed flat fields. ~42% of claim files (535/1280) use the bare-key
form `sourcer: alexastrum` written by extract.py. Added bare-key handling
between the prefixed-flat path and the legacy-source-field fallback.
Block format (`attribution: { sourcer: [...] }`) still wins when present.
2. lib/contributor.py: record_contributor_attribution() parsed the diff text
with regex looking for `+- handle: "X"` lines. This matched neither the
bare-key flat format nor the `attribution: { sourcer: [...] }` block
format Leo uses for manual extractions. Replaced the regex parser with
a file walker that calls attribution.parse_attribution_from_file() on
each changed knowledge file — single source of truth for both formats.
3. scripts/backfill-sourcer-attribution.py: walks all merged knowledge files,
re-attributes via the canonical parser, upserts contributors. Default
additive mode preserves existing high counts (e.g. m3taversal.sourcer=1011
reflects Telegram-curator credit accumulated via a different code path
that this fix does not touch). --reset flag for the destructive case.
Dry-run preview (additive mode):
- 670 NEW contributors to insert (mostly source-citation handles)
- 77 EXISTING contributors with under-counted role columns
- alexastrum: 0 → 6, thesensatore: 0 → 5, cameron-s1: 0 → 2
- astra.sourcer: 0 → 96, leo.sourcer: 0 → 44, theseus.sourcer: 0 → 18
- m3taversal.sourcer: 1011 (preserved, not 22 from file walk)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backfill-sources.py runs every 15 minutes and derives sources.status
purely from directory location. If a source file is in inbox/queue/,
it blindly overwrites the DB status to 'unprocessed' — even when the
DB already had 'extracted' or 'null_result'.
This is why the 43 zombies kept coming back after manual backfill:
cron re-reset them every 15 minutes, then each 4h cooldown expiry
re-triggered runaway extraction on the same source.
Fix: never regress from a terminal status (extracted, null_result,
error, ghost_no_file) to 'unprocessed'. File location is ambiguous
(legitimately new vs. zombie from failed archive); DB is authoritative.
Legitimate re-extraction still works — it goes through the needs_reextraction
path which is unaffected by this gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- contribution_scores table stores per-PR CI with action type
- Profile endpoint returns action_ci alongside role-based ci_score
- Branch-name attribution: contrib/NAME/ PRs attributed to NAME
- Cameron now shows 0.32 CI + BELIEF MOVER badge from challenge
- Handle variant matching (cameron-s1 → cameron) for cross-system lookup
- Full historical backfill: 985 scores across 9 contributors
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
matplotlib chart with dual axes — cumulative claims (#00d4aa) and
contributors (#7c3aed) on dark background. 1200x630 for Twitter.
Auto-regenerates hourly via /api/contributor-graph endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Classifies merged PRs by action type, scores with importance multiplier
(confidence, domain maturity, connectivity bonus), updates contributor
records, posts summary to Telegram, serves via /api/digest/latest.
Cron: 7:07 UTC daily (8:07 AM London).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Crawls domains/foundations/core/decisions for [[wiki-links]], resolves
against claim files, entities, maps, and agents. Reports dead links,
orphans, and connectivity stats. Prerequisite for CI scoring connectivity
bonus — broken links would inflate scores.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds async git-log-based endpoint for cumulative contributor and claim
tracking. 5-minute cache, excludes bot accounts, tags founding contributors.
Standalone CLI script also included for ad-hoc data generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Orphan ratio at 39.6% (443/1118 claims) vs <15% target. Root cause:
reweave threshold 0.70 too strict for text-embedding-3-small — 56% of
orphans found "no neighbors." At 0.55, dry-run shows 0% no-neighbor
skips. Batch size 200 clears backlog in ~3-4 nights at ~$0.20/run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>