teleo/teleo-infrastructure

Author	SHA1	Message	Date
m3taversal	5f554bc2de	feat: atomic extract-and-connect + stale PR monitor + response audit Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Atomic extract-and-connect (lib/connect.py): - After extraction writes claim files, each new claim is embedded via OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55) are added as `related` edges in the claim's frontmatter - Edges written on NEW claim only — avoids merge conflicts - Cross-domain connections enabled, non-fatal on Qdrant failure - Wired into openrouter-extract-v2.py post-extraction step Stale PR monitor (lib/stale_pr.py): - Every watchdog cycle checks open extract/* PRs - If open >30 min AND 0 claim files → auto-close with comment - After 2 stale closures → marks source as extraction_failed - Wired into watchdog.py as check #6 Response audit system: - response_audit table (migration v8), persistent audit conn in bot.py - 90-day retention cleanup, tool_calls JSON column - Confidence tag stripping, systemd ReadWritePaths for pipeline.db Supporting infrastructure: - reweave.py: nightly edge reconnection for orphan claims - reconcile-sources.py: source status reconciliation - backfill-domains.py: domain classification backfill - ops/reconcile-source-status.sh: operational reconciliation script - Attribution improvements, post-extract enrichments, merge improvements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 22:34:20 +00:00
m3taversal	0457c49094	fix: zombie retry loop + cost tracking Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before re-extracting. Sources with >=3 closed PRs are skipped (zombie protection). Cost tracking: openrouter_call() now returns (text, usage) tuple with prompt_tokens and completion_tokens from the OpenRouter API response. All callers updated to unpack and pass tokens to costs.record_usage(). Added missing triage cost recording. Fixed batch domain review recording cost once per batch instead of once per PR. Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 11:29:58 +00:00
m3taversal	d79ff60689	epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio): 1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter 2. Ghost PR detection — ls-remote branch check in reconciliation, network guard 3. Source status contract — directory IS status, no code change needed 4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check) 5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews 6. Mirror pre-flight permissions — chown check in sync-mirror.sh 7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback 8. Post-merge source archiving — queue/ → archive/{domain}/ after merge Pipeline fixes: - merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea) - merge_failures diagnostic counter - Startup recovery preserves eval_attempts (was incorrectly resetting to 0) - No-diff PRs auto-closed by eval (root cause of 17 zombie PRs) - GC threshold aligned with substantive fixer budget (was 2, now 4) - Conflict retry with 3-attempt budget + permanent conflict handler - Local ff-merge fallback for Forgejo 405 errors Telegram bot: - KB retrieval: 3-layer (entity resolution → claim search → agent context) - Reply-to-bot handler (context.bot.id check) - Tag regex: @teleo\|@futairdbot - Prompt rewrite for natural analyst voice - Market data API integration (Ben's token price endpoint) - Conversation windows (5-message unanswered counter, per-user-per-chat) - Conversation history in prompt (last 5 exchanges) - Worktree file lock for archive writes Infrastructure: - worktree_lock.py — file-based lock (flock) for main worktree coordination - backfill-sources.py — source DB registration for Argus funnel - batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard - sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight - Argus dashboard — conflicts + reviewing in backlog, queue count in funnel - Enrichment-inside-frontmatter bug fix (regex anchor, not --- split) Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-20 20:17:27 +00:00
m3taversal	090b1411fd	epimetheus: source archive restructure — inbox/queue + inbox/archive/{domain} + inbox/null-result - config.py: added INBOX_QUEUE, INBOX_NULL_RESULT constants - evaluate.py: skip patterns + LIGHT tier cover all inbox/ subdirs - llm.py: eval prompts reference inbox/ generically - telegram/bot.py: archives to inbox/queue/ - telegram/teleo-telegram.service: ReadWritePaths expanded - research-prompt-v2.md: paths updated to inbox/queue/ - research-prompt-leo-synthesis.md: paths updated - migrate-source-archive.py: one-time migration script Reviewed by: Ganymede, Rhea, Leo (all approved) Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-18 11:50:04 +00:00
m3taversal	ffa718e834	ganymede: implement tier logic — LIGHT skip, claim-shape detector, pre-merge promotion - Claim-shape detector: if YAML has type: claim, force STANDARD minimum (Theseus) - Random pre-merge promotion: 15% of LIGHT → STANDARD before eval (Rio) - LIGHT_SKIP_LLM config flag: skip domain+Leo review for LIGHT (Rhea: env var rollback) - Updated both_approve: domain_verdict=skipped is valid for LIGHT auto-approve - Cost recording: only charge for reviews that actually ran - SAMPLE_AUDIT_RATE bumped 0.10 → 0.15, audit model = Opus (Leo: different family from Haiku) Multi-agent design review: Rio (gaming vectors, model diversity), Theseus (correlated blindspots, claim-shape guard), Rhea (shadow mode, config flag, deployment), Leo (approval). Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 18:05:43 +00:00
m3taversal	615af9b53d	leo: prioritize fresh PRs over re-evals in eval queue Unevaluated PRs (eval_attempts=0) now sort before re-evals in the eval cycle query. Fresh PRs have a higher chance of passing (~12%) vs re-evals of already-rejected PRs. Prevents migration-reset PRs from consuming eval slots that fresh PRs could use. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:32:07 +00:00
m3taversal	f4dc6b39ce	leo: warn on NULL source_path in _terminate_pr (Ganymede nit) If source_path is NULL, the source requeue silently matches nothing. Log a warning so we catch orphaned terminations in monitoring. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:17:30 +00:00
m3taversal	e7c902bac8	leo: implement retry budget — stop infinite eval loops Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON) columns to prs table. Retry budget logic (Ganymede-approved design): - Increment eval_attempts on each evaluate_pr() call - Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human) - Attempt 1: normal — back to open, wait for fix - Attempt 2: classify issues as mechanical/substantive - Mechanical only (schema, wiki links, dedup): keep open for one more try - Substantive (factual, confidence, scope, title): close PR, requeue source - Issue tags parsed from reviewer comments, stored in eval_issues column - SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset - Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike - Cost recording updated: domain review → OpenRouter, Leo → tier-dependent Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:14:12 +00:00
m3taversal	c0a6adf9ed	leo: model diversity + calibrated review prompts - Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter), Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots. - Opus reserved for DEEP eval only — protects rate limit for overnight research. - Review prompts calibrated: require per-criterion evidence, blocking-vs-observation verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate. - OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid spurious 15-min Opus backoff. - merge.py: pre-check PR state before merge API call (prevents 405 on re-merge). Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:10:30 +00:00
m3taversal	85b86a918a	ganymede: extract lib/llm.py from evaluate.py (Phase 3c) Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: LLM transport (OpenRouter, Claude CLI), prompt templates (triage/domain/Leo), and review runner functions moved to lib/llm.py. evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo posting, rate limit backoff, and evaluate_cycle. - Why: evaluate.py was 734 lines mixing orchestration with LLM concerns. Now 455 lines orchestration + 250 lines LLM transport. Each module has a single responsibility. - Connections: completes Phase 3 structural refactor (forgejo.py + domains.py + llm.py). teleo-pipeline.py updated to import kill_active_subprocesses from lib.llm. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:40:18 +00:00
m3taversal	ff5162d5ba	ganymede: extract lib/domains.py — single domain→agent mapping Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: Unified DOMAIN_AGENT_MAP, VALID_DOMAINS, agent_for_domain(), detect_domain_from_diff(), detect_domain_from_branch() into lib/domains.py. Removed duplicated mappings from evaluate.py and merge.py. VALID_DOMAINS in validate.py now derives from DOMAIN_AGENT_MAP.keys() (single source of truth). - Why: Phase 3 structural refactor. Domain mapping was duplicated across evaluate.py (DOMAIN_AGENT_MAP) and merge.py (agent_domain dict). Adding a domain required editing 3 files; now it requires editing 1. - Connections: evaluate.py uses agent_for_domain() + detect_domain_from_diff(), merge.py uses detect_domain_from_branch(), validate.py uses VALID_DOMAINS. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:33:18 +00:00
m3taversal	9d69629893	ganymede: extract lib/forgejo.py — single Forgejo API client Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: Unified forgejo_api(), get_pr_diff(), get_agent_token(), repo_path() into lib/forgejo.py. Removed 3 duplicate _forgejo_api functions (evaluate.py, merge.py, validate.py), 2 duplicate _get_pr_diff functions (evaluate.py, validate.py), and 1 _agent_token function (evaluate.py). - Why: Phase 3 structural refactor. Single source of truth for all Forgejo HTTP calls. Eliminates ~90 lines of duplicated code across 3 modules. - Connections: All hardcoded repo paths now use repo_path() helper. Consumer modules no longer reference config.FORGEJO_URL/OWNER/REPO/TOKEN_FILE directly. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:29:34 +00:00
m3taversal	a7251d7529	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Phase 2 of pipeline refactoring: - pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest, pytest-asyncio, ruff). Ruff configured with sane defaults + ignore rules for existing code patterns (implicit Optional, timezone.utc). - .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff lint, ruff format, pytest on every PR and push to main. - deploy.sh: Pull + venv update + syntax check + optional restart. Replaces ad-hoc scp workflow. - tests/conftest.py: Shared fixture for in-memory SQLite with full schema. Ready for Phase 4 test suite. - .gitignore: Added venv, pytest cache, coverage, build artifacts. - Ruff auto-fixes: import sorting, unused imports removed across all modules. All files pass ruff check + ruff format. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:24:27 +00:00
m3taversal	f166db4f62	ganymede: fix 4 critical bugs before pipeline restart - Fix #12: domain_review undefined on resume path — initialize to None, guard _parse_issues() call. Prevents NameError on PRs resuming after partial eval (76 PRs in this state right now). - Fix #11: concurrent eval workers can duplicate reviews — add atomic UPDATE SET status='reviewing' WHERE status='open' at top of evaluate_pr(). Check rowcount, skip if already claimed. - Fix #8: subprocess tracking for graceful shutdown — _active_subprocesses set in evaluate module, tracked in _claude_cli_call, exposed via kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py. - Fix health.py divide-by-zero — guard all metabolic metric reads against None from NULLIF/empty result set. Prevents TypeError on /health when no PRs have been evaluated in 24h. Also includes Leo's existing hot-fixes: - Rate limit detection checks stdout regardless of exit code - 15-minute cycle-level backoff on rate limit Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:13:25 +00:00
m3taversal	799249d470	Initial commit: Pipeline v2 daemon + infrastructure docs - teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge) - lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules - INFRASTRUCTURE.md: comprehensive deep-dive for onboarding - teleo-pipeline.service: systemd unit file Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-12 14:11:18 +00:00

15 commits