teleo/teleo-infrastructure

Author	SHA1	Message	Date
m3taversal	540ba97b9d	fix(attribution): Phase A followup — bug #1 + 4 nits + refactor (Ganymede review) Some checks are pending CI / lint-and-test (push) Waiting to run Details Addresses Apr 24 review of `58fa8c52`. All 6 findings landed. Bug #1 — git log -1 returns latest commit, not first (semantic mismatch with "original author" comment): Drop -1 flag, take last line of default-ordered log output (= oldest). Fixes mis-credit on multi-commit PRs where a reviewer rebased/force-pushed. Nit #2 — forward writer didn't pass merged_at: Fetch merged_at in the prs SELECT, thread pr_merged_at through all 5 insert_contribution_event call sites. Keeps forward-emitted and backfilled event timestamps on the same timeline after merge retries. Nit #3 — legacy-counts fallback paths emit no events (parity gap): git-author and prs.agent fallback paths now emit challenger/synthesizer events via the TRAILER_EVENT_ROLE map when refined_type matches. Closes the gap where external-contributor challenge/enrich PRs would accumulate legacy counts but disappear from event-sourced leaderboards. Nit #4 — migration v24 agent seed missing 'pipeline': Added "pipeline" to the seed list. Plus new migration v25 with idempotent corrective UPDATE so existing envs (where v24 already ran) pick up the fix on restart without requiring manual SQL. Verified on VPS state: pipeline row was kind='person', will flip to 'agent' on redeploy. Nit #5 — backfill summary prints originator attempted=0 in wrong pass: Split the "=== Summary ===" header into "=== PR-level events ===" and "=== Claim-level originator pass ===" with originator counts in the right block. Operator-facing cosmetic. Refactor #6 — AGENT_BRANCH_PREFIXES duplicated in 2 sites: Extracted to lib/attribution.py as single source of truth. contributor.py imports it. backfill-events.py keeps its local copy (runs standalone without pipeline package import) with a sync-reference comment. No behavioral drift for the common case. Backfill re-runs cleanly against existing forward-written events (UNIQUE-index idempotency). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 14:13:54 +01:00
m3taversal	58fa8c5276	feat(attribution): Phase A — event-sourced contribution ledger (schema v24) Some checks are pending CI / lint-and-test (push) Waiting to run Details Introduces contribution_events table + non-breaking double-write. Schema lands today, forward traffic writes events alongside existing count upserts, backfill script replays history. Phase B will add leaderboard API reading from events; Phase C switches Argus dashboard over. ## Schema v24 (lib/db.py) - contribution_events: one row per credit-earning event (id, handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp) Partial UNIQUE indexes handle SQLite's NULL != NULL semantics: idx_ce_unique_claim on (handle, role, pr_number, claim_path) WHERE claim_path NOT NULL idx_ce_unique_pr on (handle, role, pr_number) WHERE claim_path IS NULL PR-level events (evaluator, author, challenger, synthesizer) dedup on 3-tuple. Per-claim events (originator) dedup on 4-tuple. Idempotent on replay. - contributor_aliases: canonical handle mapping Seeded: @thesensatore → thesensatore, cameron → cameron-s1 - contributors.kind TEXT DEFAULT 'person' Migration seeds 'agent' for known Pentagon agent handles. ## Role model (confirmed by Cory Apr 24) Weights: author 0.30, challenger 0.25, synthesizer 0.20, originator 0.15, evaluator 0.05 - author: human who submitted the PR (curation + submission work) - originator: person who authored the underlying content (rewards external creators) - challenger: agent/person who brought a productive disagreement - synthesizer: cross-domain work (enrichments, research sessions) - evaluator: reviewer who approved (Leo + domain agent) Humans-are-always-author: agents credit is capped at evaluator/synthesizer/ challenger. Pentagon agents classify as kind='agent' and surface in the agent-view leaderboard, not the default person view. ## Writer (lib/contributor.py) - New insert_contribution_event(): idempotent INSERT OR IGNORE with alias normalization + kind classification. Falls back silently on pre-v24 DBs. - record_contributor_attribution double-writes alongside existing upsert_contributor calls. Zero risk to current dashboard. - Author event: emitted once per PR from prs.submitted_by → git author → agent-branch-prefix. - Originator events: emitted per claim from frontmatter sourcer, skipping when sourcer == author (avoids self-credit double-count). - Evaluator events: Leo (always when leo_verdict='approve') + domain_agent (when domain_verdict='approve' and not Leo). - Challenger/Synthesizer: emitted from Pentagon-Agent trailer on agent-owned branches (theseus/, rio/, etc.) based on commit_type. Pipeline-owned branches (extract/, reweave/) get no trailer-based event — infrastructure work isn't contribution credit. ## Helpers (lib/attribution.py) - normalize_handle(raw, conn=None): lowercase + strip @ + alias lookup - classify_kind(handle): returns 'agent' for PENTAGON_AGENTS, else 'person' Intentionally narrow. Orgs get classified by operator review, not heuristics. ## Backfill (scripts/backfill-events.py) Replays all merged PRs into events. Idempotent (safe to re-run). Emits: - PR-level: author, evaluator, challenger, synthesizer - Per-claim: originator (walks knowledge tree, matches via description titles) Known limitation: post-merge PR branches are deleted from Forgejo, so we can't diff them for granular per-claim events. Claim→PR mapping uses prs.description (pipe-separated titles). Misses some edge cases but recovers the bulk of historical originator credit. Forward traffic gets clean per-claim events via the normal record_contributor_attribution path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:59:22 +01:00
m3taversal	93917f9fc2	fix(attribution): --diff-filter=A + handle sanity filter + remove legacy fallback Some checks are pending CI / lint-and-test (push) Waiting to run Details Ganymede review findings on epimetheus/contributor-attribution-fix branch: 1. BUG: record_contributor_attribution used `git diff --name-only` (all modified files), not just added. Enrich/challenge PRs re-credited the sourcer on every subsequent modification. Fixed: --diff-filter=A restricts to new files only. The synthesizer/challenger/reviewer roles for enrich PRs are still credited via the Pentagon-Agent trailer path, so this doesn't lose any correct credit. 2. WARNING: Legacy `source`-field heuristic fabricated garbage handles from descriptive strings ("sec-interpretive-release-s7-2026-09-(march-17", "governance---meritocratic-voting-+-futarchy"). Removed outright + added regex handle sanity filter (`^[a-z0-9][a-z0-9_-]{0,38}$`). Applied before every return path in parse_attribution (the nested-block early return was previously bypassing the filter). Dry-run impact: unique handles 83→70 (13 garbage filtered), NEW contributors 49→48, EXISTING drift rows 34→22. The filter drops rows where the literal garbage string lives in frontmatter (Slotkin case: attribution.sourcer.handle was written as "senator-elissa-slotkin-/-the-hill" by the buggy legacy path). 3. NIT: Aligned knowledge_prefixes in the file walker to match is_knowledge_pr (removed entities/, convictions/). Widening those requires Cory sign-off since is_knowledge_pr currently gates entity-only PRs out of CI. Tests: 17 pass (added test_bad_handles_filtered, test_valid_handle_with_hyphen_passes, updated test_legacy_source_fallback → test_legacy_source_fallback_removed). Ganymede review — 3-message protocol msg 3 pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 12:58:55 +01:00
m3taversal	3fe0f4b744	fix(attribution): credit sourcer/extractor from claim frontmatter Three layers of contributor-attribution bug surfaced by Apr 24 leaderboard investigation. alexastrum, thesensatore, cameron-s1 all had real merged contributions but zero credit in the contributors table. 1. lib/attribution.py: parse_attribution() only read `attribution_sourcer:` prefix-keyed flat fields. ~42% of claim files (535/1280) use the bare-key form `sourcer: alexastrum` written by extract.py. Added bare-key handling between the prefixed-flat path and the legacy-source-field fallback. Block format (`attribution: { sourcer: [...] }`) still wins when present. 2. lib/contributor.py: record_contributor_attribution() parsed the diff text with regex looking for `+- handle: "X"` lines. This matched neither the bare-key flat format nor the `attribution: { sourcer: [...] }` block format Leo uses for manual extractions. Replaced the regex parser with a file walker that calls attribution.parse_attribution_from_file() on each changed knowledge file — single source of truth for both formats. 3. scripts/backfill-sourcer-attribution.py: walks all merged knowledge files, re-attributes via the canonical parser, upserts contributors. Default additive mode preserves existing high counts (e.g. m3taversal.sourcer=1011 reflects Telegram-curator credit accumulated via a different code path that this fix does not touch). --reset flag for the destructive case. Dry-run preview (additive mode): - 670 NEW contributors to insert (mostly source-citation handles) - 77 EXISTING contributors with under-counted role columns - alexastrum: 0 → 6, thesensatore: 0 → 5, cameron-s1: 0 → 2 - astra.sourcer: 0 → 96, leo.sourcer: 0 → 44, theseus.sourcer: 0 → 18 - m3taversal.sourcer: 1011 (preserved, not 22 from file walk) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 12:48:41 +01:00
m3taversal	5f554bc2de	feat: atomic extract-and-connect + stale PR monitor + response audit Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Atomic extract-and-connect (lib/connect.py): - After extraction writes claim files, each new claim is embedded via OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55) are added as `related` edges in the claim's frontmatter - Edges written on NEW claim only — avoids merge conflicts - Cross-domain connections enabled, non-fatal on Qdrant failure - Wired into openrouter-extract-v2.py post-extraction step Stale PR monitor (lib/stale_pr.py): - Every watchdog cycle checks open extract/* PRs - If open >30 min AND 0 claim files → auto-close with comment - After 2 stale closures → marks source as extraction_failed - Wired into watchdog.py as check #6 Response audit system: - response_audit table (migration v8), persistent audit conn in bot.py - 90-day retention cleanup, tool_calls JSON column - Confidence tag stripping, systemd ReadWritePaths for pipeline.db Supporting infrastructure: - reweave.py: nightly edge reconnection for orphan claims - reconcile-sources.py: source status reconciliation - backfill-domains.py: domain classification backfill - ops/reconcile-source-status.sh: operational reconciliation script - Attribution improvements, post-extract enrichments, merge improvements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:34:20 +00:00
m3taversal	d79ff60689	epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio): 1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter 2. Ghost PR detection — ls-remote branch check in reconciliation, network guard 3. Source status contract — directory IS status, no code change needed 4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check) 5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews 6. Mirror pre-flight permissions — chown check in sync-mirror.sh 7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback 8. Post-merge source archiving — queue/ → archive/{domain}/ after merge Pipeline fixes: - merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea) - merge_failures diagnostic counter - Startup recovery preserves eval_attempts (was incorrectly resetting to 0) - No-diff PRs auto-closed by eval (root cause of 17 zombie PRs) - GC threshold aligned with substantive fixer budget (was 2, now 4) - Conflict retry with 3-attempt budget + permanent conflict handler - Local ff-merge fallback for Forgejo 405 errors Telegram bot: - KB retrieval: 3-layer (entity resolution → claim search → agent context) - Reply-to-bot handler (context.bot.id check) - Tag regex: @teleo\|@futairdbot - Prompt rewrite for natural analyst voice - Market data API integration (Ben's token price endpoint) - Conversation windows (5-message unanswered counter, per-user-per-chat) - Conversation history in prompt (last 5 exchanges) - Worktree file lock for archive writes Infrastructure: - worktree_lock.py — file-based lock (flock) for main worktree coordination - backfill-sources.py — source DB registration for Argus funnel - batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard - sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight - Argus dashboard — conflicts + reviewing in backlog, queue count in funnel - Enrichment-inside-frontmatter bug fix (regex anchor, not --- split) Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-20 20:17:27 +00:00

6 commits