1. WARNING — orphan contributor_aliases after publisher/garbage delete:
Added alias cleanup to the transaction (gated on --delete-events, same
audit rationale as events). Both garbage and publisher deletion loops
now DELETE matching contributor_aliases rows. Dry-run adds an orphan
count diagnostic so the --delete-events decision is informed.
2. NIT — inserted_publishers counter over-reports on replay:
INSERT OR IGNORE silently skips name collisions, but the counter
incremented unconditionally. Now uses cur.rowcount so a second apply
reports 0 inserts instead of falsely claiming 100. moved_to_publisher
set remains unconditional — publisher rows already present still need
the matching contributors row deleted.
3. NIT — handle-length gate diverged from writer path:
Widened from {0,19} (20 chars) to {0,38} (39 chars) to match GitHub's
handle limit and contributor.py::_HANDLE_RE. Prevents future long-handle
real contributors from falling through to review_needed and blocking
--apply. Current data has 0 review_needed either way.
Bonus (Q5): Added audit_log entry inside the transaction. One row in
audit_log.stage='schema_v26', event='classify_contributors' with counter
detail JSON on every --apply run. Cheap audit trail for the destructive op.
Verified end-to-end on VPS DB snapshot:
- First apply: 100/9/9/100/0 (matches pre-fix)
- Second apply: 0/9/0/0/0 (counter fix working)
- With injected aliases + --delete-events: 2 aliases deleted, 1 pre-existing
orphan correctly left alone (outside script scope), audit_log entry
written with accurate counters.
Ganymede msg-3. Protocol closed.
Separates three concerns currently conflated in contributors table:
contributors — people + agents we credit (kind in 'person','agent')
publishers — news orgs / academic venues / platforms (not credited)
sources — gains publisher_id + content_type + original_author columns
Rationale (Cory directive Apr 24): livingip.xyz leaderboard was showing CNBC,
SpaceNews, TechCrunch etc. at the top because the attribution pipeline credited
news org names as if they were contributors. The mechanism-level fix is a
schema split — orgs live in publishers, individuals in contributors, each
table has one semantics.
Migration v26:
- CREATE TABLE publishers (id PK, name UNIQUE, kind CHECK IN
news|academic|social_platform|podcast|self|internal|legal|government|
research_org|commercial|other, url_pattern, created_at)
- CREATE TABLE contributor_identities (contributor_handle, platform CHECK IN
x|telegram|github|email|web|internal, platform_handle, verified, created_at)
Composite PK on (platform, platform_handle) + index on contributor_handle.
Enables one contributor to unify X + TG + GitHub handles.
- ALTER TABLE sources ADD COLUMN publisher_id REFERENCES publishers(id)
- ALTER TABLE sources ADD COLUMN content_type
(article|paper|tweet|conversation|self_authored|webpage|podcast)
- ALTER TABLE sources ADD COLUMN original_author TEXT
(free-text fallback, e.g., "Kim et al." — not credit-bearing)
- ALTER TABLE sources ADD COLUMN original_author_handle REFERENCES contributors(handle)
(set only when the author is in our contributor network)
- ALTER wrapped in try/except on "duplicate column" for replay safety
- Both SCHEMA_SQL (fresh installs) + migration block (upgrades) updated
- SCHEMA_VERSION bumped 25 -> 26
Migration is non-breaking. No data moves yet. Existing publishers-polluting-
contributors row state is preserved until the classifier runs. Writer routing
to these tables lands in a separate branch (Phase B writer changes).
Classifier (scripts/classify-contributors.py):
Analyzes existing contributors rows, buckets into:
keep_agent — 9 Pentagon agents
keep_person — 21 real humans + reachable pseudonymous X/TG handles
publisher — 100 news orgs, academic venues, formal-citation names,
brand/platform names
garbage — 9 parse artifacts (containing /, parens, 3+ hyphens)
review_needed — 0 (fully covered by current allowlists)
Hand-curated allowlists for news/academic/social/internal publisher kinds.
Garbage detection via regex on special chars and length > 50.
Named pseudonyms without @ prefix (karpathy, simonw, swyx, metaproph3t,
sjdedic, ceterispar1bus, etc.) classified as keep_person — they're real
X/TG contributors missing an @ prefix because extraction frontmatter
didn't normalize. Cory's auto-create rule catches these on first reference.
Formal-citation names (Firstname-Lastname form — Clayton Christensen, Hayek,
Ostrom, Friston, Bostrom, Bak, etc.) classified as academic publishers —
these are cited, not reachable via @ handle. Get promoted to contributors
if/when they sign up with an @ handle.
Apply path is transactional (BEGIN / COMMIT / ROLLBACK on error). Publisher
insert happens before contributor delete, and contributor delete is gated
on successful insert so we never lose a row by moving it to a failed
publisher insert.
--apply path flags:
--delete-events : also DELETE contribution_events rows for moved handles
(default: keep events for audit trail)
--show <handle> : inspect a single row's classification
Smoke-tested end-to-end via local copy of VPS DB:
Before: 139 contributors total (polluted with orgs)
After: 30 contributors (9 agent + 21 person), 100 publishers, 9 deleted
contribution_events: 3,705 preserved
contributors <-> publishers overlap: 0
Named contributors verified present after --apply:
alexastrum (claims=6) thesensatore (5) cameron-s1 (1) m3taversal (1011)
Pentagon agent 'pipeline' (claims_merged=771) intentionally retained — it's
the process name from old extract.py fallback path, not a real contributor.
Classified as agent (kind='agent') so doesn't appear in person leaderboard.
Deploy sequence after Ganymede review:
1. Branch ff-merge to main
2. scp lib/db.py + scripts/classify-contributors.py to VPS
3. Pipeline already at v26 (migration ran during earlier v26 restart)
4. Run dry-run: python3 ops/classify-contributors.py
5. Apply: python3 ops/classify-contributors.py --apply
6. Verify: livingip.xyz leaderboard stops showing CNBC/SpaceNews
7. Argus /api/contributors unaffected (reads contributors directly, now clean)
Follow-up branch (not in this commit):
- Writer routing in lib/contributor.py + extract.py:
org handles -> publishers table + sources.publisher_id
person handles with @ prefix -> auto-create contributor, tier='cited'
formal-citation names -> sources.original_author (free text)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the search endpoint to accept POST bodies matching the embedded
chat contract (query/limit/min_score/domain/confidence/exclude →
slug/path/title/domain/confidence/score/body_excerpt). GET path retained
for legacy callers and adds a min_score override for hackathon debug.
- _qdrant_hits_to_results() shapes raw hits into chat response format
- handle_api_search() dispatches POST vs GET
- /api/search added to _PUBLIC_PATHS (chat is unauthenticated)
- POST route registered alongside existing GET
Resolves VPS↔repo drift flagged by Argus before next deploy.sh run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two findings in commit 762fd42 review:
1. BUG: guard query was tautological. `SELECT MAX(number) FROM prs WHERE
number < 900000` filters out exactly what the `>= 900000` check tests.
Replaced with a direct check for unexpected rows in the synthetic range
(excluding our known 900068/900088).
2. WARNING: origin defaults to 'pipeline' via schema default. lib/merge.py
convention is origin='human' for external contributors. Synthetic rows
now set origin='human', priority='high' — matches discover_external_prs
for real GitHub PRs. Prevents Phase B origin-based filtering from
misclassifying Alex/Cameron as machine-authored.
Also flagged in review: credit projection was optimistic. Author events are
PR-level (not per-claim), so Alex gets 1×0.30 author credit, not 6. Same
for Cameron. Per-claim originator credit goes to the 7 frontmatter sourcers
where applicable. Not a code change — expectation reset for Cory.
Two historical GitHub PRs merged before our sync-mirror.sh tracked github_pr:
- GitHub PR #68: alexastrum, 6 claims, merged Mar 9 2026 via squash merge
- GitHub PR #88: Cameron-S1, 1 claim, merged early April
Their claim files were lost during a Forgejo→GitHub mirror overwrite and later
recovered via direct-to-main commits (dba00a79, da64f805). Because the
recovery commits bypassed the pipeline, our 'prs' table has no row to attach
originator events to — all 4 backfill-events.py strategies returned None,
leaving Alex + Cameron at 0 originator credits despite real historical work.
This reconstructs synthetic 'prs' rows so the existing github_pr strategy in
backfill-events.py attaches 7 originator events on re-run:
- Numbers 900068 / 900088 live in a clearly-synthetic range that cannot
collide with real Forgejo PRs (current max: 3941)
- github_pr=68/88 wires up the existing lookup strategy
- submitted_by=alexastrum / cameron-s1 establishes author attribution
- merged_at from the recovery commit messages (not recovery-commit time)
- last_error tags the rows as synthetic for future audits
Idempotent: INSERT OR IGNORE via check on number OR github_pr. Safe to replay.
Reversible: DELETE FROM prs WHERE number IN (900068, 900088).
After applying this script:
python3 ops/backfill-events.py
will credit Alex with 6 author + 6 originator events (author=1.80, originator=0.90)
and Cameron with 1 author + 1 originator (0.30 + 0.15), all dated to the
historical merge dates — so 7d/30d leaderboard windows show them correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SQLite datetime comparison fails lexicographically across ISO-T and
space-separator formats: '2026-03-27 18:00:14' < '2026-03-27T17:43:04+00:00'
because space (0x20) < T (0x54). PRs merged same-day but earlier than the
commit hour were silently excluded from the time-proximity cascade.
Shaga's 3 stigmergic-coordination claims resolved to PR #2032 (later, wrong)
instead of #2025 (earlier, correct). Fixed by wrapping both sides in
datetime(), which normalizes to space-separator before comparison.
Verified: all 3 Shaga claims now resolve to #2025 via git_time_proximity.
No change to totals (126 originator events, 5 proximity hits) — the fix
corrects WHICH PR each proximity-matched claim resolves to, not whether.
Caught by Ganymede review of 1d6b515.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite claim-level pass in backfill-events.py to recover the Forgejo PR
that introduced each claim via a cascade of 4 strategies (reliability
order), replacing the single title→description match that missed PRs
with NULL description (Cameron #3377) and bare-subject extracts (Shaga's
Leo research PR).
## Strategies
1. sourced_from frontmatter → prs.source_path stem match
2. git log first-add commit → subject pattern → prs.branch
- "<agent>: extract claims from <slug>" → extract/<slug>
- "<agent>: research session YYYY-MM-DD" → <agent>/research-<date>
- "<agent>: (challenge|contrib|entity|synthesize)" → <agent>/*
- "Recover X from GitHub PR #N" → prs.github_pr=N
- "Extract N claims from X" (no prefix) → time-proximity on
agent-owned branches within 24h
3. Current title_desc fallback for anything the above miss
## Dry-run projection (1,662 merged PRs)
Before:
Claims processed: 33
Originator events: 6
Breakdown: {no_pr_match: 1608, no_sourcer: 26, invalid_handle: 21, skip_self: 6}
After:
Claims processed: 505 (+472)
Originator events: 126 (+120)
Strategy hits: git_subject=412, sourced_from=88, git_time_proximity=5
Breakdown: {no_pr_match: 1095, no_sourcer: 67, invalid_handle: 359, skip_self: 20}
## Verified on real VPS data
- @thesensatore claims: 3/5 resolve via git_time_proximity to leo/ PRs
- Cameron-S1, alexastrum: remain None — their recovery commits
(dba00a79, da64f805) bypassed the pipeline entirely, no Forgejo PR
record exists. Requires synthetic prs rows — deferred to separate
commit with its own Ganymede review (write operation, larger blast
radius than this pure-read backfill change).
## Implementation
- New find_pr_for_claim(conn, repo, md) helper returns (pr_number, strategy)
- Claim-level pass uses it first, falls back to title_desc map
- Strategy counter surfaced in summary output for operator visibility
Idempotent — backfill re-runs skip duplicate events via the partial
UNIQUE index on contribution_events.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Apr 24 review of 58fa8c52. All 6 findings landed.
Bug #1 — git log -1 returns latest commit, not first (semantic mismatch
with "original author" comment):
Drop -1 flag, take last line of default-ordered log output (= oldest).
Fixes mis-credit on multi-commit PRs where a reviewer rebased/force-pushed.
Nit #2 — forward writer didn't pass merged_at:
Fetch merged_at in the prs SELECT, thread pr_merged_at through all 5
insert_contribution_event call sites. Keeps forward-emitted and backfilled
event timestamps on the same timeline after merge retries.
Nit #3 — legacy-counts fallback paths emit no events (parity gap):
git-author and prs.agent fallback paths now emit challenger/synthesizer
events via the TRAILER_EVENT_ROLE map when refined_type matches. Closes
the gap where external-contributor challenge/enrich PRs would accumulate
legacy counts but disappear from event-sourced leaderboards.
Nit #4 — migration v24 agent seed missing 'pipeline':
Added "pipeline" to the seed list. Plus new migration v25 with idempotent
corrective UPDATE so existing envs (where v24 already ran) pick up the
fix on restart without requiring manual SQL. Verified on VPS state:
pipeline row was kind='person', will flip to 'agent' on redeploy.
Nit #5 — backfill summary prints originator attempted=0 in wrong pass:
Split the "=== Summary ===" header into "=== PR-level events ===" and
"=== Claim-level originator pass ===" with originator counts in the
right block. Operator-facing cosmetic.
Refactor #6 — AGENT_BRANCH_PREFIXES duplicated in 2 sites:
Extracted to lib/attribution.py as single source of truth. contributor.py
imports it. backfill-events.py keeps its local copy (runs standalone
without pipeline package import) with a sync-reference comment.
No behavioral drift for the common case. Backfill re-runs cleanly against
existing forward-written events (UNIQUE-index idempotency).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces contribution_events table + non-breaking double-write. Schema
lands today, forward traffic writes events alongside existing count upserts,
backfill script replays history. Phase B will add leaderboard API reading
from events; Phase C switches Argus dashboard over.
## Schema v24 (lib/db.py)
- contribution_events: one row per credit-earning event
(id, handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
Partial UNIQUE indexes handle SQLite's NULL != NULL semantics:
idx_ce_unique_claim on (handle, role, pr_number, claim_path) WHERE claim_path NOT NULL
idx_ce_unique_pr on (handle, role, pr_number) WHERE claim_path IS NULL
PR-level events (evaluator, author, challenger, synthesizer) dedup on 3-tuple.
Per-claim events (originator) dedup on 4-tuple. Idempotent on replay.
- contributor_aliases: canonical handle mapping
Seeded: @thesensatore → thesensatore, cameron → cameron-s1
- contributors.kind TEXT DEFAULT 'person'
Migration seeds 'agent' for known Pentagon agent handles.
## Role model (confirmed by Cory Apr 24)
Weights: author 0.30, challenger 0.25, synthesizer 0.20, originator 0.15, evaluator 0.05
- author: human who submitted the PR (curation + submission work)
- originator: person who authored the underlying content (rewards external creators)
- challenger: agent/person who brought a productive disagreement
- synthesizer: cross-domain work (enrichments, research sessions)
- evaluator: reviewer who approved (Leo + domain agent)
Humans-are-always-author: agents credit is capped at evaluator/synthesizer/
challenger. Pentagon agents classify as kind='agent' and surface in the
agent-view leaderboard, not the default person view.
## Writer (lib/contributor.py)
- New insert_contribution_event(): idempotent INSERT OR IGNORE with alias
normalization + kind classification. Falls back silently on pre-v24 DBs.
- record_contributor_attribution double-writes alongside existing
upsert_contributor calls. Zero risk to current dashboard.
- Author event: emitted once per PR from prs.submitted_by → git author →
agent-branch-prefix.
- Originator events: emitted per claim from frontmatter sourcer, skipping
when sourcer == author (avoids self-credit double-count).
- Evaluator events: Leo (always when leo_verdict='approve') + domain_agent
(when domain_verdict='approve' and not Leo).
- Challenger/Synthesizer: emitted from Pentagon-Agent trailer on
agent-owned branches (theseus/*, rio/*, etc.) based on commit_type.
Pipeline-owned branches (extract/*, reweave/*) get no trailer-based event —
infrastructure work isn't contribution credit.
## Helpers (lib/attribution.py)
- normalize_handle(raw, conn=None): lowercase + strip @ + alias lookup
- classify_kind(handle): returns 'agent' for PENTAGON_AGENTS, else 'person'
Intentionally narrow. Orgs get classified by operator review, not heuristics.
## Backfill (scripts/backfill-events.py)
Replays all merged PRs into events. Idempotent (safe to re-run). Emits:
- PR-level: author, evaluator, challenger, synthesizer
- Per-claim: originator (walks knowledge tree, matches via description titles)
Known limitation: post-merge PR branches are deleted from Forgejo, so we
can't diff them for granular per-claim events. Claim→PR mapping uses
prs.description (pipe-separated titles). Misses some edge cases but
recovers the bulk of historical originator credit. Forward traffic gets
clean per-claim events via the normal record_contributor_attribution path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ganymede review findings on epimetheus/contributor-attribution-fix branch:
1. BUG: record_contributor_attribution used `git diff --name-only` (all modified
files), not just added. Enrich/challenge PRs re-credited the sourcer on every
subsequent modification. Fixed: --diff-filter=A restricts to new files only.
The synthesizer/challenger/reviewer roles for enrich PRs are still credited
via the Pentagon-Agent trailer path, so this doesn't lose any correct credit.
2. WARNING: Legacy `source`-field heuristic fabricated garbage handles from
descriptive strings ("sec-interpretive-release-s7-2026-09-(march-17",
"governance---meritocratic-voting-+-futarchy"). Removed outright + added
regex handle sanity filter (`^[a-z0-9][a-z0-9_-]{0,38}$`). Applied before
every return path in parse_attribution (the nested-block early return was
previously bypassing the filter).
Dry-run impact: unique handles 83→70 (13 garbage filtered), NEW contributors
49→48, EXISTING drift rows 34→22. The filter drops rows where the literal
garbage string lives in frontmatter (Slotkin case: attribution.sourcer.handle
was written as "senator-elissa-slotkin-/-the-hill" by the buggy legacy path).
3. NIT: Aligned knowledge_prefixes in the file walker to match is_knowledge_pr
(removed entities/, convictions/). Widening those requires Cory sign-off
since is_knowledge_pr currently gates entity-only PRs out of CI.
Tests: 17 pass (added test_bad_handles_filtered, test_valid_handle_with_hyphen_passes,
updated test_legacy_source_fallback → test_legacy_source_fallback_removed).
Ganymede review — 3-message protocol msg 3 pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layers of contributor-attribution bug surfaced by Apr 24 leaderboard
investigation. alexastrum, thesensatore, cameron-s1 all had real merged
contributions but zero credit in the contributors table.
1. lib/attribution.py: parse_attribution() only read `attribution_sourcer:`
prefix-keyed flat fields. ~42% of claim files (535/1280) use the bare-key
form `sourcer: alexastrum` written by extract.py. Added bare-key handling
between the prefixed-flat path and the legacy-source-field fallback.
Block format (`attribution: { sourcer: [...] }`) still wins when present.
2. lib/contributor.py: record_contributor_attribution() parsed the diff text
with regex looking for `+- handle: "X"` lines. This matched neither the
bare-key flat format nor the `attribution: { sourcer: [...] }` block
format Leo uses for manual extractions. Replaced the regex parser with
a file walker that calls attribution.parse_attribution_from_file() on
each changed knowledge file — single source of truth for both formats.
3. scripts/backfill-sourcer-attribution.py: walks all merged knowledge files,
re-attributes via the canonical parser, upserts contributors. Default
additive mode preserves existing high counts (e.g. m3taversal.sourcer=1011
reflects Telegram-curator credit accumulated via a different code path
that this fix does not touch). --reset flag for the destructive case.
Dry-run preview (additive mode):
- 670 NEW contributors to insert (mostly source-citation handles)
- 77 EXISTING contributors with under-counted role columns
- alexastrum: 0 → 6, thesensatore: 0 → 5, cameron-s1: 0 → 2
- astra.sourcer: 0 → 96, leo.sourcer: 0 → 44, theseus.sourcer: 0 → 18
- m3taversal.sourcer: 1011 (preserved, not 22 from file walk)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause (per Epi audit):
- /api/claims, /api/contributors/list, /api/contributors/{handle} returned
404 in prod. The route registrations and claims_api.py module existed only
on VPS — never committed. Today's auto-deploy of an unrelated app.py change
rsync'd the repo (registration-less) version over the VPS edits, wiping
endpoints Vercel depended on.
- Recurrence of the deploy-without-commit pattern (blindspot #2).
Brings repo to parity with the live, working VPS state:
- Add diagnostics/claims_api.py (161 lines, was VPS-only)
- Wire register_claims_routes + register_contributor_routes in app.py
alongside the existing register_activity_feed call
beliefs_routes.py is also VPS-only and currently unregistered (orphaned by
the same Apr 21 manual edit that dropped its registration). Left out of this
commit pending a decision on whether to revive or delete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
/api/activity and /api/activity-feed were never registered in app.py —
both files existed but neither route was reachable (confirmed 404 on VPS).
Register both so Timeline and gamification feeds can consume them.
Adds source_channel to /api/activity payload (both PR rows and audit
events — audit rows return null since they aren't tied to a specific PR).
Migration v22 already populated prs.source_channel on VPS with enum:
telegram=2340, agent=698, maintenance=102, unknown=11, github=1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds p.source_channel to the SELECT and surfaces it on each event.
Migration v22 populated the column with enum values: telegram, agent,
maintenance, unknown, github. Timeline UI needs this to show per-event
provenance (2340 telegram, 698 agent, 102 maintenance, 11 unknown, 1 github).
Nulls fall back to "unknown" — only 0 rows currently null, but the
fallback is defensive for future inserts before backfill runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ganymede review nit: if get_pr_diff returns an empty string (edge case —
Forgejo quirk, empty PR), the old `if diff is None` branch would miss it,
the `elif diff and ...` would evaluate False (empty string is falsy), and
control would fall to `else` — triggering auto-close on zero diff content.
Change `if diff is None` → `if not diff` so empty string ALSO falls through
to the conservative path. Matches the stated posture: skip auto-close when
in doubt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevents Apr 22 runaway-damage pattern (44 open PRs manually bulk-closed)
where a source extracted 20+ times before the cooldown gate landed, each
leaving an orphan 'open' PR after eval correctly rejected as near-duplicate.
Gate fires in dispose_rejected_pr before attempt-count branches:
all_issues == ["near_duplicate"] (exact match — compound carries signal)
AND sibling PR exists with same source_path in status='merged'
AND diff contains "new file mode" (not enrichment-only)
→ close on Forgejo + DB with audit, post explanation comment.
Ganymede review — 5 must-fix/warnings applied + 1 must-add:
- Exact match on single-issue near_duplicate (compound rejections preserved)
- Enrichment guard via diff scan (eval_parse regex can flag enrichment prose)
- 10s timeout on get_pr_diff — conservative fallback on Forgejo wedge
- Forgejo comment with canned explanation (best-effort, try/except)
- Partial index idx_prs_source_path + migration v23
- Explicit p1.source_path IS NOT NULL in WHERE
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backfill-sources.py runs every 15 minutes and derives sources.status
purely from directory location. If a source file is in inbox/queue/,
it blindly overwrites the DB status to 'unprocessed' — even when the
DB already had 'extracted' or 'null_result'.
This is why the 43 zombies kept coming back after manual backfill:
cron re-reset them every 15 minutes, then each 4h cooldown expiry
re-triggered runaway extraction on the same source.
Fix: never regress from a terminal status (extracted, null_result,
error, ghost_no_file) to 'unprocessed'. File location is ambiguous
(legitimately new vs. zombie from failed archive); DB is authoritative.
Legitimate re-extraction still works — it goes through the needs_reextraction
path which is unaffected by this gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three targeted fixes from Ganymede's review of commit 469cb7f:
BUG #1 — Success path now updates sources.status='extracting' before PR
creation, so queue scan's DB-authoritative filter catches sources between
PR creation and merge. Previously the cooldown gate was load-bearing for
this window, not belt-and-suspenders as claimed.
BUG #2 — Second null-result path (line 573, triggered when enrichments
existed but all targets were missing in worktree) now updates DB. Without
this, that path created no PR, no DB mark, and would have re-entered the
runaway loop 4h later when the cooldown window expired.
NIT #6 — 4h cooldown moved to config.EXTRACTION_COOLDOWN_HOURS. Tunable
without code change. Log format now shows the configured hours.
Also backfilled 59 pre-existing zombie queue-path rows where the file
was already archived but DB status said 'unprocessed' — these would have
leaked past the DB filter once the 4h cooldown expired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes reduce extraction cost and duplicate PR flood:
1. 4-hour cooldown gate — skip sources with ANY PR (merged/closed/open)
created in the last 4h. Prevents same source re-extracting every 60s
while archive step lags behind merge.
2. DB-authoritative status — sources.status is now updated in the pipeline DB
at each extraction terminal point (null_result, success). Queue scan checks
DB first so sources with failed archives (e.g., root-owned worktree files
blocking git pull --rebase) don't get re-extracted forever. Also moves
archival into the extraction branch so it goes through PR merge instead
of a fragile separate main-worktree push.
3. source_channel wiring — extract.py PR INSERT now sets source_channel from
classify_source_channel(branch). Previously daemon-created PRs had NULL
source_channel, breaking Argus dashboard filters. Combined with Ship's
in-branch archive refactor.
Root incident: blockworks-metadao-strategic-reset.md extracted 31 times in
12 hours. Nine other sources hit 10-22 extractions each. Near-duplicate
rejection rate jumped to 94%.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forward link: claims get `sourced_from: {domain}/{filename}` at extraction time.
Reverse link: after merge, backlink_source_claims() updates source files with
`claims_extracted:` list. All disk writes happen under async_main_worktree_lock.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SYSTEM_ACCOUNTS set excludes pipeline/unknown/teleo-agents from /api/contributors/list
- primary_ci field: action_ci.total when available, else role-based ci_score
- action_ci included in list endpoint for each contributor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- contribution_scores table stores per-PR CI with action type
- Profile endpoint returns action_ci alongside role-based ci_score
- Branch-name attribution: contrib/NAME/ PRs attributed to NAME
- Cameron now shows 0.32 CI + BELIEF MOVER badge from challenge
- Handle variant matching (cameron-s1 → cameron) for cross-system lookup
- Full historical backfill: 985 scores across 9 contributors
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GET /api/contributors/{handle} — returns CI score, badges, domain
breakdown, role percentages, contribution timeline, review stats.
GET /api/contributors/list — leaderboard with min_claims filter.
Git-log fallback for contributors not in pipeline.db (Cameron, Alex).
Badge system: FOUNDING CONTRIBUTOR, BELIEF MOVER, KNOWLEDGE SOURCER,
DOMAIN SPECIALIST, VETERAN, FIRST BLOOD.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Serves contribution events from pipeline.db. Classifies PRs as
create/enrich/challenge, normalizes contributors, derives summaries
from branch names when descriptions are empty. Hot sort uses
challenge*3 + enrich*2 + signal / hours^1.5 decay from event time.
Domain and contributor filters, pagination (limit/offset).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
matplotlib chart with dual axes — cumulative claims (#00d4aa) and
contributors (#7c3aed) on dark background. 1200x630 for Twitter.
Auto-regenerates hourly via /api/contributor-graph endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Classifies merged PRs by action type, scores with importance multiplier
(confidence, domain maturity, connectivity bonus), updates contributor
records, posts summary to Telegram, serves via /api/digest/latest.
Cron: 7:07 UTC daily (8:07 AM London).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Crawls domains/foundations/core/decisions for [[wiki-links]], resolves
against claim files, entities, maps, and agents. Reports dead links,
orphans, and connectivity stats. Prerequisite for CI scoring connectivity
bonus — broken links would inflate scores.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config.py had extractor-heavy weights (0.40) from initial bootstrap.
Correct weights per approved architecture: challenger 0.35, synthesizer
0.25, reviewer 0.20, sourcer 0.15, extractor 0.05. backfill-ci.py
already had correct weights; this fixes the live computation in health.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The contributor attribution always recorded "extractor" regardless of
the PR's refined commit_type. Added COMMIT_TYPE_TO_ROLE mapping and
applied it in all three attribution paths (Pentagon-Agent trailer,
git author fallback, PR agent fallback).
Backfill script resets and re-derives role counts from prs.commit_type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a contributor merges main into their fork branch (standard GitHub
workflow), merge-base equals main SHA, triggering the 'already up to
date' early return. This closes the PR without cherry-picking the new
content. Cameron's PR #3377 hit this exact bug.
Fix: add a diff check before returning 'already up to date'. If the
branch has actual content changes vs main, proceed to cherry-pick
instead of short-circuiting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds async git-log-based endpoint for cumulative contributor and claim
tracking. 5-minute cache, excludes bot accounts, tags founding contributors.
Standalone CLI script also included for ad-hoc data generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ranger was liquidated — no point fetching empty data every cron run.
Also purged 1,647 pre-Apr-20 snapshot rows (incomplete NAV data from
data collection ramp-up, not actual market movement).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rejection_reason was always NULL in review_records — now populated with
comma-joined issue tags (near_duplicate, frontmatter_schema, etc.) at both
rejection call sites. Also fixes stale reviewer_model="gpt-4o" hardcoding
to use config.EVAL_DOMAIN_MODEL (currently Gemini Flash).
Ingestion branches (ingestion/futardio-*, ingestion/metadao-*) now resolve
to internet-finance domain instead of falling through to "general".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Near-duplicate (159+ rejections):
- Add extract-time dedup gate: SequenceMatcher check before file write ($0)
- Strengthen extraction prompt: high-similarity matches (>=0.75) get explicit
"DO NOT extract, use enrichment instead" warning
- Strip [[wiki link]] brackets from related_claims field
Frontmatter schema (129+ rejections):
- Normalize LLM confidence aliases (high→likely, medium→experimental, etc.)
in both _build_claim_content and validate_schema
- Strip code fences (```markdown/```yaml) from entity content in extract.py
and from diff content in validate.py tier0.5 check
- Code fences were root cause of "no_frontmatter" failures: parser sees
```markdown as first line, not ---
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch RSYNC_FLAGS string to RSYNC_OPTS bash array (same fix as deploy.sh
in 368b579 — string passed literal quotes to rsync, matching nothing)
- Add tests/ to rsync targets and syntax check glob for parity with deploy.sh
- All 8 rsync calls now use "${RSYNC_OPTS[@]}" expansion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two YAML files on VPS but not in repo. Agent identity, KB scope, and
voice configs for the Telegram bots. No secrets (tokens reference file
paths, not inline values).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RSYNC_FLAGS as a string meant --exclude='__pycache__' passed literal
quotes to rsync, matching nothing. Switched to bash array (RSYNC_OPTS)
so excludes work correctly. __pycache__ and .pyc files no longer sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
deploy.sh was missing telegram/ and tests/ directories — code existed in
repo but never synced to VPS. Also removes hardcoded twitterapi.io key
from x-ingest.py (reads from secrets file like all other modules).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- auto-deploy.sh: fetch_coins.py was missing from the root-level .py deploy
loop (line 72). Only manual deploy.sh had it. Next cycle syncs it to VPS.
- fetch_coins.py: document the ALTER TABLE loop as legacy migration for older
DBs that predate the CREATE TABLE columns.
Reviewed-by: Ganymede
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fetch_coins.py was committed to repo root but deploy.sh only deployed
teleo-pipeline.py and reweave.py. This meant bug fixes to fetch_coins
would silently fail to reach VPS on deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dashboard_portfolio.py:
- datetime.utcnow() → datetime.now(timezone.utc) (deprecation fix)
- days parameter validation with try/except + min(..., 365) on 2 endpoints
fetch_coins.py:
- isinstance(chain, str) guard prevents AttributeError on string chain values
- Log when adjusted market cap differs from DexScreener value
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pull live app.py from VPS to close 243-line drift. Add portfolio
dashboard (renamed from v2), portfolio nav link, and fetch_coins.py
(daily cron script for ownership coin data). Delete stale lib/ copy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A transient DB lock in breaker.record_failure() inside an except handler
killed the asyncio coroutine permanently — snapshot_cycle died Apr 18 and
never recovered. All three breaker call sites now have their own try/except.
Also includes HTML injection fix for github_feedback review_text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause of 84% reweave PR rejection rate: claim titles with colons
(e.g., "COAL: Meta-PoW: The ORE Treasury Protocol") written as bare
YAML list items, causing yaml.safe_load to fail during merge.
Three changes:
1. frontmatter.py: _yaml_quote() wraps colon-containing values in double quotes
2. reweave.py: _write_edge_regex uses _yaml_quote for new edges
3. merge.py: skip individual files with parse failures instead of aborting entire PR
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mode 100644 → 100755. Previous commit added the safety net but missed
the actual git mode change due to staging order.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
research-session.sh and install-hermes.sh were committed with mode 100644
during repo reorganization (d2aec7fe). rsync -az preserved the non-executable
mode, breaking all research agent cron jobs since Apr 15. Safety net in
auto-deploy.sh ensures any future permission loss is auto-corrected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>