Introduces contribution_events table + non-breaking double-write. Schema
lands today, forward traffic writes events alongside existing count upserts,
backfill script replays history. Phase B will add leaderboard API reading
from events; Phase C switches Argus dashboard over.
## Schema v24 (lib/db.py)
- contribution_events: one row per credit-earning event
(id, handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
Partial UNIQUE indexes handle SQLite's NULL != NULL semantics:
idx_ce_unique_claim on (handle, role, pr_number, claim_path) WHERE claim_path NOT NULL
idx_ce_unique_pr on (handle, role, pr_number) WHERE claim_path IS NULL
PR-level events (evaluator, author, challenger, synthesizer) dedup on 3-tuple.
Per-claim events (originator) dedup on 4-tuple. Idempotent on replay.
- contributor_aliases: canonical handle mapping
Seeded: @thesensatore → thesensatore, cameron → cameron-s1
- contributors.kind TEXT DEFAULT 'person'
Migration seeds 'agent' for known Pentagon agent handles.
## Role model (confirmed by Cory Apr 24)
Weights: author 0.30, challenger 0.25, synthesizer 0.20, originator 0.15, evaluator 0.05
- author: human who submitted the PR (curation + submission work)
- originator: person who authored the underlying content (rewards external creators)
- challenger: agent/person who brought a productive disagreement
- synthesizer: cross-domain work (enrichments, research sessions)
- evaluator: reviewer who approved (Leo + domain agent)
Humans-are-always-author: agents credit is capped at evaluator/synthesizer/
challenger. Pentagon agents classify as kind='agent' and surface in the
agent-view leaderboard, not the default person view.
## Writer (lib/contributor.py)
- New insert_contribution_event(): idempotent INSERT OR IGNORE with alias
normalization + kind classification. Falls back silently on pre-v24 DBs.
- record_contributor_attribution double-writes alongside existing
upsert_contributor calls. Zero risk to current dashboard.
- Author event: emitted once per PR from prs.submitted_by → git author →
agent-branch-prefix.
- Originator events: emitted per claim from frontmatter sourcer, skipping
when sourcer == author (avoids self-credit double-count).
- Evaluator events: Leo (always when leo_verdict='approve') + domain_agent
(when domain_verdict='approve' and not Leo).
- Challenger/Synthesizer: emitted from Pentagon-Agent trailer on
agent-owned branches (theseus/*, rio/*, etc.) based on commit_type.
Pipeline-owned branches (extract/*, reweave/*) get no trailer-based event —
infrastructure work isn't contribution credit.
## Helpers (lib/attribution.py)
- normalize_handle(raw, conn=None): lowercase + strip @ + alias lookup
- classify_kind(handle): returns 'agent' for PENTAGON_AGENTS, else 'person'
Intentionally narrow. Orgs get classified by operator review, not heuristics.
## Backfill (scripts/backfill-events.py)
Replays all merged PRs into events. Idempotent (safe to re-run). Emits:
- PR-level: author, evaluator, challenger, synthesizer
- Per-claim: originator (walks knowledge tree, matches via description titles)
Known limitation: post-merge PR branches are deleted from Forgejo, so we
can't diff them for granular per-claim events. Claim→PR mapping uses
prs.description (pipe-separated titles). Misses some edge cases but
recovers the bulk of historical originator credit. Forward traffic gets
clean per-claim events via the normal record_contributor_attribution path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevents Apr 22 runaway-damage pattern (44 open PRs manually bulk-closed)
where a source extracted 20+ times before the cooldown gate landed, each
leaving an orphan 'open' PR after eval correctly rejected as near-duplicate.
Gate fires in dispose_rejected_pr before attempt-count branches:
all_issues == ["near_duplicate"] (exact match — compound carries signal)
AND sibling PR exists with same source_path in status='merged'
AND diff contains "new file mode" (not enrichment-only)
→ close on Forgejo + DB with audit, post explanation comment.
Ganymede review — 5 must-fix/warnings applied + 1 must-add:
- Exact match on single-issue near_duplicate (compound rejections preserved)
- Enrichment guard via diff scan (eval_parse regex can flag enrichment prose)
- 10s timeout on get_pr_diff — conservative fallback on Forgejo wedge
- Forgejo comment with canned explanation (best-effort, try/except)
- Partial index idx_prs_source_path + migration v23
- Explicit p1.source_path IS NOT NULL in WHERE
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ganymede review findings:
1. source_channel was missing from CREATE TABLE (fresh installs wouldn't have it)
2. Default fallback changed from 'telegram' to 'unknown' — unknown prefixes
are genuinely unknown, not telegram
3. Cross-reference comments added between BRANCH_PREFIX_MAP and _CHANNEL_MAP
Also wires classify_source_channel into merge.py PR discovery INSERT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables GitHub↔Forgejo PR linking for the contributor pipeline.
Mirror script will store GitHub PR number when creating Forgejo PRs,
allowing back-sync of eval feedback and merge/close status.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
db.py: migration v20 adds conflict_rebase_attempts, merge_failures,
merge_cycled columns (already exist on VPS via manual migration, missing
from code — any future DB rebuild would break retry mechanism).
merge.py: replace retry-with-backoff on config.lock with asyncio.Lock
(_bare_repo_lock) around all worktree add/remove calls. Prevents
contention instead of retrying it. Applied to both _cherry_pick_onto_main
and _merge_reweave_pr.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Atomic extract-and-connect (lib/connect.py):
- After extraction writes claim files, each new claim is embedded via
OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55)
are added as `related` edges in the claim's frontmatter
- Edges written on NEW claim only — avoids merge conflicts
- Cross-domain connections enabled, non-fatal on Qdrant failure
- Wired into openrouter-extract-v2.py post-extraction step
Stale PR monitor (lib/stale_pr.py):
- Every watchdog cycle checks open extract/* PRs
- If open >30 min AND 0 claim files → auto-close with comment
- After 2 stale closures → marks source as extraction_failed
- Wired into watchdog.py as check #6
Response audit system:
- response_audit table (migration v8), persistent audit conn in bot.py
- 90-day retention cleanup, tool_calls JSON column
- Confidence tag stripping, systemd ReadWritePaths for pipeline.db
Supporting infrastructure:
- reweave.py: nightly edge reconnection for orphan claims
- reconcile-sources.py: source status reconciliation
- backfill-domains.py: domain classification backfill
- ops/reconcile-source-status.sh: operational reconciliation script
- Attribution improvements, post-extract enrichments, merge improvements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON)
columns to prs table.
Retry budget logic (Ganymede-approved design):
- Increment eval_attempts on each evaluate_pr() call
- Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human)
- Attempt 1: normal — back to open, wait for fix
- Attempt 2: classify issues as mechanical/substantive
- Mechanical only (schema, wiki links, dedup): keep open for one more try
- Substantive (factual, confidence, scope, title): close PR, requeue source
- Issue tags parsed from reviewer comments, stored in eval_issues column
- SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset
- Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike
- Cost recording updated: domain review → OpenRouter, Leo → tier-dependent
Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state.
Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>