teleo/teleo-infrastructure

Author	SHA1	Message	Date
m3taversal	615af9b53d	leo: prioritize fresh PRs over re-evals in eval queue Unevaluated PRs (eval_attempts=0) now sort before re-evals in the eval cycle query. Fresh PRs have a higher chance of passing (~12%) vs re-evals of already-rejected PRs. Prevents migration-reset PRs from consuming eval slots that fresh PRs could use. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:32:07 +00:00
m3taversal	f4dc6b39ce	leo: warn on NULL source_path in _terminate_pr (Ganymede nit) If source_path is NULL, the source requeue silently matches nothing. Log a warning so we catch orphaned terminations in monitoring. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:17:30 +00:00
m3taversal	e7c902bac8	leo: implement retry budget — stop infinite eval loops Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON) columns to prs table. Retry budget logic (Ganymede-approved design): - Increment eval_attempts on each evaluate_pr() call - Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human) - Attempt 1: normal — back to open, wait for fix - Attempt 2: classify issues as mechanical/substantive - Mechanical only (schema, wiki links, dedup): keep open for one more try - Substantive (factual, confidence, scope, title): close PR, requeue source - Issue tags parsed from reviewer comments, stored in eval_issues column - SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset - Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike - Cost recording updated: domain review → OpenRouter, Leo → tier-dependent Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:14:12 +00:00
m3taversal	c0a6adf9ed	leo: model diversity + calibrated review prompts - Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter), Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots. - Opus reserved for DEEP eval only — protects rate limit for overnight research. - Review prompts calibrated: require per-criterion evidence, blocking-vs-observation verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate. - OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid spurious 15-min Opus backoff. - merge.py: pre-check PR state before merge API call (prevents 405 on re-merge). Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:10:30 +00:00
m3taversal	85b86a918a	ganymede: extract lib/llm.py from evaluate.py (Phase 3c) Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: LLM transport (OpenRouter, Claude CLI), prompt templates (triage/domain/Leo), and review runner functions moved to lib/llm.py. evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo posting, rate limit backoff, and evaluate_cycle. - Why: evaluate.py was 734 lines mixing orchestration with LLM concerns. Now 455 lines orchestration + 250 lines LLM transport. Each module has a single responsibility. - Connections: completes Phase 3 structural refactor (forgejo.py + domains.py + llm.py). teleo-pipeline.py updated to import kill_active_subprocesses from lib.llm. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:40:18 +00:00
m3taversal	ff5162d5ba	ganymede: extract lib/domains.py — single domain→agent mapping Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: Unified DOMAIN_AGENT_MAP, VALID_DOMAINS, agent_for_domain(), detect_domain_from_diff(), detect_domain_from_branch() into lib/domains.py. Removed duplicated mappings from evaluate.py and merge.py. VALID_DOMAINS in validate.py now derives from DOMAIN_AGENT_MAP.keys() (single source of truth). - Why: Phase 3 structural refactor. Domain mapping was duplicated across evaluate.py (DOMAIN_AGENT_MAP) and merge.py (agent_domain dict). Adding a domain required editing 3 files; now it requires editing 1. - Connections: evaluate.py uses agent_for_domain() + detect_domain_from_diff(), merge.py uses detect_domain_from_branch(), validate.py uses VALID_DOMAINS. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:33:18 +00:00
m3taversal	9d69629893	ganymede: extract lib/forgejo.py — single Forgejo API client Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details - What: Unified forgejo_api(), get_pr_diff(), get_agent_token(), repo_path() into lib/forgejo.py. Removed 3 duplicate _forgejo_api functions (evaluate.py, merge.py, validate.py), 2 duplicate _get_pr_diff functions (evaluate.py, validate.py), and 1 _agent_token function (evaluate.py). - Why: Phase 3 structural refactor. Single source of truth for all Forgejo HTTP calls. Eliminates ~90 lines of duplicated code across 3 modules. - Connections: All hardcoded repo paths now use repo_path() helper. Consumer modules no longer reference config.FORGEJO_URL/OWNER/REPO/TOKEN_FILE directly. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 15:29:34 +00:00
m3taversal	a7251d7529	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Phase 2 of pipeline refactoring: - pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest, pytest-asyncio, ruff). Ruff configured with sane defaults + ignore rules for existing code patterns (implicit Optional, timezone.utc). - .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff lint, ruff format, pytest on every PR and push to main. - deploy.sh: Pull + venv update + syntax check + optional restart. Replaces ad-hoc scp workflow. - tests/conftest.py: Shared fixture for in-memory SQLite with full schema. Ready for Phase 4 test suite. - .gitignore: Added venv, pytest cache, coverage, build artifacts. - Ruff auto-fixes: import sorting, unused imports removed across all modules. All files pass ruff check + ruff format. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:24:27 +00:00
m3taversal	f166db4f62	ganymede: fix 4 critical bugs before pipeline restart - Fix #12: domain_review undefined on resume path — initialize to None, guard _parse_issues() call. Prevents NameError on PRs resuming after partial eval (76 PRs in this state right now). - Fix #11: concurrent eval workers can duplicate reviews — add atomic UPDATE SET status='reviewing' WHERE status='open' at top of evaluate_pr(). Check rowcount, skip if already claimed. - Fix #8: subprocess tracking for graceful shutdown — _active_subprocesses set in evaluate module, tracked in _claude_cli_call, exposed via kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py. - Fix health.py divide-by-zero — guard all metabolic metric reads against None from NULLIF/empty result set. Prevents TypeError on /health when no PRs have been evaluated in 24h. Also includes Leo's existing hot-fixes: - Rate limit detection checks stdout regardless of exit code - 15-minute cycle-level backoff on rate limit Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:13:25 +00:00
m3taversal	799249d470	Initial commit: Pipeline v2 daemon + infrastructure docs - teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge) - lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules - INFRASTRUCTURE.md: comprehensive deep-dive for onboarding - teleo-pipeline.service: systemd unit file Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-12 14:11:18 +00:00

10 commits