teleo/teleo-infrastructure

Author	SHA1	Message	Date
m3taversal	f46e14dfae	refactor: Phase 4 — extract eval_actions.py, drop underscore prefixes in eval_parse Some checks are pending CI / lint-and-test (push) Waiting to run Details Three changes: 1. Drop underscore prefixes in eval_parse.py — functions are now the public API of the module (filter_diff, parse_verdict, classify_issues, etc.). All 12 functions renamed, imports updated in evaluate.py and tests. 2. Extract eval_actions.py from evaluate.py — 3 async PR disposition functions: - post_formal_approvals: submit Forgejo reviews from 2 agents - terminate_pr: close PR, post rejection comment, requeue source - dispose_rejected_pr: disposition logic for rejected PRs on attempt 2+ evaluate.py drops from ~1140 to 911 lines. 3. 14 new tests in test_eval_actions.py covering all three functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:57:51 +01:00
m3taversal	376b77999f	refactor: Phase 3 — fix close_pr ghost bug, wire stale_pr, extract eval_parse Some checks are pending CI / lint-and-test (push) Waiting to run Details Critical bug fix: close_pr now checks forgejo_api return value and skips DB update on Forgejo failure, preventing ghost PRs (DB closed, Forgejo open). Returns bool so callers can handle failures. _terminate_pr checks return value — skips source requeue on failure. stale_pr.py migrated from raw Forgejo+DB to close_pr (last raw close transition eliminated). eval_parse.py: 15 pure parsing functions extracted from evaluate.py (~370 lines removed). Zero I/O, zero async, independently testable. evaluate.py drops from ~1510 to ~1140 lines. Tests: 295 passed (42 new eval_parse + 2 new close_pr), zero regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:40:23 +01:00
m3taversal	1f5eb324f3	refactor: centralize PR state transitions in lib/pr_state.py Some checks are pending CI / lint-and-test (push) Waiting to run Details Replace 38 hand-crafted UPDATE prs SET status calls across evaluate.py and merge.py with 7 centralized functions that enforce invariants: - close_pr: always syncs Forgejo (opt-out for reconciliation) - approve_pr: raises ValueError on empty domain (prevents NULL bugs) - mark_merged: always sets merged_at, clears last_error - mark_conflict: always increments merge_failures, sets merge_cycled - mark_conflict_permanent: terminal conflict state - reopen_pr: handles all reopen scenarios (transient, rejection, reeval) - start_review: atomic claim with bool return This eliminates the class of bugs that produced 3 incidents: 1. Domain NULL on musings bypass (7 PRs stuck, 20h zero throughput) 2. Forgejo ghost PRs (70 PRs open on Forgejo but closed in DB) 3. Merge_cycled missing on various close paths Also fixes: 3 close paths in merge.py had DB update before Forgejo call (reversed order). close_pr does Forgejo first, then DB. Only remaining raw status transition: _claim_next_pr (approved→merging) which is an atomic subquery and doesn't have invariant requirements. 20 new tests, 264 total passing, 0 regressions. Net -101 lines in evaluate.py + merge.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:08:57 +01:00
m3taversal	e0c9951308	fix: close stale PRs on Forgejo when pipeline DB marks them closed Some checks are pending CI / lint-and-test (push) Waiting to run Details Two code paths set status='closed' in the pipeline DB without calling the Forgejo API to close the PR. This caused 50 ghost PRs to accumulate on Forgejo (dashboard shows review backlog) while the pipeline considered them done. - evaluate.py: no-diff stale branch close now calls Forgejo PATCH - merge.py: permanent conflict close now calls Forgejo PATCH Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:15:58 +01:00
m3taversal	ad7ee0831e	fix(evaluate): set domain + auto_merge on all 5 approval paths Some checks are pending CI / lint-and-test (push) Waiting to run Details Musings bypass and batch both_approve set status='approved' without domain or auto_merge. Merge gate requires domain IS NOT NULL and prefix match OR auto_merge=1. Result: agent PRs deadlocked for 20+ hours. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:03:42 +01:00
m3taversal	81afcd319f	fix: sync all code from VPS — repo is now authoritative source of truth Some checks are pending CI / lint-and-test (push) Waiting to run Details 24 files: 8 pipeline lib modules, 6 diagnostics updates, 4 new diagnostics modules, telegram bot fix, 5 active operational scripts. Key changes: - Security: SQL injection prevention (alerting.py), SSL verification (review_queue.py), path traversal guard (extract.py) - Cost tracking: per-PR cost accumulation in evaluate.py - Auto-recovery: watchdog tier0 reset with retry cap + cooldown - Extraction: structured edge fields, post-write vector connection - New modules: vitality, research_tracking, research_routes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:18:01 +01:00
m3taversal	681afad506	Consolidate pipeline code from teleo-codex + VPS into single repo Some checks failed CI / lint-and-test (push) Has been cancelled Details Sources merged: - teleo-codex/ops/pipeline-v2/ (11 newer lib files, 5 new lib modules) - teleo-codex/ops/ (agent-state, diagnostics expansion, systemd units, ops scripts) - VPS /opt/teleo-eval/telegram/ (10 new bot files, agent configs) - VPS /opt/teleo-eval/pipeline/ops/ (vector-gc, backfill-descriptions) - VPS /opt/teleo-eval/sync-mirror.sh (Bug 2 + Step 2.5 fixes) Non-trivial merges: - connect.py: kept codex threshold (0.65) + added infra domain parameter - watchdog.py: kept infra version (stale_pr integration, superset of codex) - deploy.sh: codex rsync version (interim, until VPS git clone migration) - diagnostics/app.py: codex decomposed dashboard (14 new route modules) 81 files changed, +17105/-200 lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:52:26 +01:00
m3taversal	5f554bc2de	feat: atomic extract-and-connect + stale PR monitor + response audit Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Atomic extract-and-connect (lib/connect.py): - After extraction writes claim files, each new claim is embedded via OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55) are added as `related` edges in the claim's frontmatter - Edges written on NEW claim only — avoids merge conflicts - Cross-domain connections enabled, non-fatal on Qdrant failure - Wired into openrouter-extract-v2.py post-extraction step Stale PR monitor (lib/stale_pr.py): - Every watchdog cycle checks open extract/* PRs - If open >30 min AND 0 claim files → auto-close with comment - After 2 stale closures → marks source as extraction_failed - Wired into watchdog.py as check #6 Response audit system: - response_audit table (migration v8), persistent audit conn in bot.py - 90-day retention cleanup, tool_calls JSON column - Confidence tag stripping, systemd ReadWritePaths for pipeline.db Supporting infrastructure: - reweave.py: nightly edge reconnection for orphan claims - reconcile-sources.py: source status reconciliation - backfill-domains.py: domain classification backfill - ops/reconcile-source-status.sh: operational reconciliation script - Attribution improvements, post-extract enrichments, merge improvements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:34:20 +00:00
m3taversal	0457c49094	fix: zombie retry loop + cost tracking Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before re-extracting. Sources with >=3 closed PRs are skipped (zombie protection). Cost tracking: openrouter_call() now returns (text, usage) tuple with prompt_tokens and completion_tokens from the OpenRouter API response. All callers updated to unpack and pass tokens to costs.record_usage(). Added missing triage cost recording. Fixed batch domain review recording cost once per batch instead of once per PR. Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 11:29:58 +00:00
m3taversal	d79ff60689	epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio): 1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter 2. Ghost PR detection — ls-remote branch check in reconciliation, network guard 3. Source status contract — directory IS status, no code change needed 4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check) 5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews 6. Mirror pre-flight permissions — chown check in sync-mirror.sh 7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback 8. Post-merge source archiving — queue/ → archive/{domain}/ after merge Pipeline fixes: - merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea) - merge_failures diagnostic counter - Startup recovery preserves eval_attempts (was incorrectly resetting to 0) - No-diff PRs auto-closed by eval (root cause of 17 zombie PRs) - GC threshold aligned with substantive fixer budget (was 2, now 4) - Conflict retry with 3-attempt budget + permanent conflict handler - Local ff-merge fallback for Forgejo 405 errors Telegram bot: - KB retrieval: 3-layer (entity resolution → claim search → agent context) - Reply-to-bot handler (context.bot.id check) - Tag regex: @teleo\|@futairdbot - Prompt rewrite for natural analyst voice - Market data API integration (Ben's token price endpoint) - Conversation windows (5-message unanswered counter, per-user-per-chat) - Conversation history in prompt (last 5 exchanges) - Worktree file lock for archive writes Infrastructure: - worktree_lock.py — file-based lock (flock) for main worktree coordination - backfill-sources.py — source DB registration for Argus funnel - batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard - sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight - Argus dashboard — conflicts + reviewing in backlog, queue count in funnel - Enrichment-inside-frontmatter bug fix (regex anchor, not --- split) Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-20 20:17:27 +00:00
m3taversal	090b1411fd	epimetheus: source archive restructure — inbox/queue + inbox/archive/{domain} + inbox/null-result - config.py: added INBOX_QUEUE, INBOX_NULL_RESULT constants - evaluate.py: skip patterns + LIGHT tier cover all inbox/ subdirs - llm.py: eval prompts reference inbox/ generically - telegram/bot.py: archives to inbox/queue/ - telegram/teleo-telegram.service: ReadWritePaths expanded - research-prompt-v2.md: paths updated to inbox/queue/ - research-prompt-leo-synthesis.md: paths updated - migrate-source-archive.py: one-time migration script Reviewed by: Ganymede, Rhea, Leo (all approved) Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-18 11:50:04 +00:00
m3taversal	ffa718e834	ganymede: implement tier logic — LIGHT skip, claim-shape detector, pre-merge promotion - Claim-shape detector: if YAML has type: claim, force STANDARD minimum (Theseus) - Random pre-merge promotion: 15% of LIGHT → STANDARD before eval (Rio) - LIGHT_SKIP_LLM config flag: skip domain+Leo review for LIGHT (Rhea: env var rollback) - Updated both_approve: domain_verdict=skipped is valid for LIGHT auto-approve - Cost recording: only charge for reviews that actually ran - SAMPLE_AUDIT_RATE bumped 0.10 → 0.15, audit model = Opus (Leo: different family from Haiku) Multi-agent design review: Rio (gaming vectors, model diversity), Theseus (correlated blindspots, claim-shape guard), Rhea (shadow mode, config flag, deployment), Leo (approval). Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 18:05:43 +00:00
m3taversal	615af9b53d	leo: prioritize fresh PRs over re-evals in eval queue Unevaluated PRs (eval_attempts=0) now sort before re-evals in the eval cycle query. Fresh PRs have a higher chance of passing (~12%) vs re-evals of already-rejected PRs. Prevents migration-reset PRs from consuming eval slots that fresh PRs could use. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:32:07 +00:00
m3taversal	f4dc6b39ce	leo: warn on NULL source_path in _terminate_pr (Ganymede nit) If source_path is NULL, the source requeue silently matches nothing. Log a warning so we catch orphaned terminations in monitoring. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:17:30 +00:00
m3taversal	e7c902bac8	leo: implement retry budget — stop infinite eval loops Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON) columns to prs table. Retry budget logic (Ganymede-approved design): - Increment eval_attempts on each evaluate_pr() call - Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human) - Attempt 1: normal — back to open, wait for fix - Attempt 2: classify issues as mechanical/substantive - Mechanical only (schema, wiki links, dedup): keep open for one more try - Substantive (factual, confidence, scope, title): close PR, requeue source - Issue tags parsed from reviewer comments, stored in eval_issues column - SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset - Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike - Cost recording updated: domain review → OpenRouter, Leo → tier-dependent Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:14:12 +00:00
m3taversal	c0a6adf9ed	leo: model diversity + calibrated review prompts - Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter), Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots. - Opus reserved for DEEP eval only — protects rate limit for overnight research. - Review prompts calibrated: require per-criterion evidence, blocking-vs-observation verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate. - OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid spurious 15-min Opus backoff. - merge.py: pre-check PR state before merge API call (prevents 405 on re-merge). Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:10:30 +00:00
m3taversal	85b86a918a	ganymede: extract lib/llm.py from evaluate.py (Phase 3c) Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: LLM transport (OpenRouter, Claude CLI), prompt templates (triage/domain/Leo), and review runner functions moved to lib/llm.py. evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo posting, rate limit backoff, and evaluate_cycle. - Why: evaluate.py was 734 lines mixing orchestration with LLM concerns. Now 455 lines orchestration + 250 lines LLM transport. Each module has a single responsibility. - Connections: completes Phase 3 structural refactor (forgejo.py + domains.py + llm.py). teleo-pipeline.py updated to import kill_active_subprocesses from lib.llm. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:40:18 +00:00
m3taversal	ff5162d5ba	ganymede: extract lib/domains.py — single domain→agent mapping Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: Unified DOMAIN_AGENT_MAP, VALID_DOMAINS, agent_for_domain(), detect_domain_from_diff(), detect_domain_from_branch() into lib/domains.py. Removed duplicated mappings from evaluate.py and merge.py. VALID_DOMAINS in validate.py now derives from DOMAIN_AGENT_MAP.keys() (single source of truth). - Why: Phase 3 structural refactor. Domain mapping was duplicated across evaluate.py (DOMAIN_AGENT_MAP) and merge.py (agent_domain dict). Adding a domain required editing 3 files; now it requires editing 1. - Connections: evaluate.py uses agent_for_domain() + detect_domain_from_diff(), merge.py uses detect_domain_from_branch(), validate.py uses VALID_DOMAINS. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:33:18 +00:00
m3taversal	9d69629893	ganymede: extract lib/forgejo.py — single Forgejo API client Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: Unified forgejo_api(), get_pr_diff(), get_agent_token(), repo_path() into lib/forgejo.py. Removed 3 duplicate _forgejo_api functions (evaluate.py, merge.py, validate.py), 2 duplicate _get_pr_diff functions (evaluate.py, validate.py), and 1 _agent_token function (evaluate.py). - Why: Phase 3 structural refactor. Single source of truth for all Forgejo HTTP calls. Eliminates ~90 lines of duplicated code across 3 modules. - Connections: All hardcoded repo paths now use repo_path() helper. Consumer modules no longer reference config.FORGEJO_URL/OWNER/REPO/TOKEN_FILE directly. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:29:34 +00:00
m3taversal	a7251d7529	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Phase 2 of pipeline refactoring: - pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest, pytest-asyncio, ruff). Ruff configured with sane defaults + ignore rules for existing code patterns (implicit Optional, timezone.utc). - .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff lint, ruff format, pytest on every PR and push to main. - deploy.sh: Pull + venv update + syntax check + optional restart. Replaces ad-hoc scp workflow. - tests/conftest.py: Shared fixture for in-memory SQLite with full schema. Ready for Phase 4 test suite. - .gitignore: Added venv, pytest cache, coverage, build artifacts. - Ruff auto-fixes: import sorting, unused imports removed across all modules. All files pass ruff check + ruff format. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:24:27 +00:00
m3taversal	f166db4f62	ganymede: fix 4 critical bugs before pipeline restart - Fix #12: domain_review undefined on resume path — initialize to None, guard _parse_issues() call. Prevents NameError on PRs resuming after partial eval (76 PRs in this state right now). - Fix #11: concurrent eval workers can duplicate reviews — add atomic UPDATE SET status='reviewing' WHERE status='open' at top of evaluate_pr(). Check rowcount, skip if already claimed. - Fix #8: subprocess tracking for graceful shutdown — _active_subprocesses set in evaluate module, tracked in _claude_cli_call, exposed via kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py. - Fix health.py divide-by-zero — guard all metabolic metric reads against None from NULLIF/empty result set. Prevents TypeError on /health when no PRs have been evaluated in 24h. Also includes Leo's existing hot-fixes: - Rate limit detection checks stdout regardless of exit code - 15-minute cycle-level backoff on rate limit Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:13:25 +00:00
m3taversal	799249d470	Initial commit: Pipeline v2 daemon + infrastructure docs - teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge) - lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules - INFRASTRUCTURE.md: comprehensive deep-dive for onboarding - teleo-pipeline.service: systemd unit file Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-12 14:11:18 +00:00

22 commits