Commit graph

7 commits

Author SHA1 Message Date
cde92d3db1 fix: wrap breaker calls in stage_loop to prevent permanent task death
Some checks are pending
CI / lint-and-test (push) Waiting to run
A transient DB lock in breaker.record_failure() inside an except handler
killed the asyncio coroutine permanently — snapshot_cycle died Apr 18 and
never recovered. All three breaker call sites now have their own try/except.

Also includes HTML injection fix for github_feedback review_text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:37:28 +01:00
681afad506 Consolidate pipeline code from teleo-codex + VPS into single repo
Some checks failed
CI / lint-and-test (push) Has been cancelled
Sources merged:
- teleo-codex/ops/pipeline-v2/ (11 newer lib files, 5 new lib modules)
- teleo-codex/ops/ (agent-state, diagnostics expansion, systemd units, ops scripts)
- VPS /opt/teleo-eval/telegram/ (10 new bot files, agent configs)
- VPS /opt/teleo-eval/pipeline/ops/ (vector-gc, backfill-descriptions)
- VPS /opt/teleo-eval/sync-mirror.sh (Bug 2 + Step 2.5 fixes)

Non-trivial merges:
- connect.py: kept codex threshold (0.65) + added infra domain parameter
- watchdog.py: kept infra version (stale_pr integration, superset of codex)
- deploy.sh: codex rsync version (interim, until VPS git clone migration)
- diagnostics/app.py: codex decomposed dashboard (14 new route modules)

81 files changed, +17105/-200 lines

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:52:26 +01:00
d79ff60689 epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features
Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio):
1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter
2. Ghost PR detection — ls-remote branch check in reconciliation, network guard
3. Source status contract — directory IS status, no code change needed
4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check)
5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews
6. Mirror pre-flight permissions — chown check in sync-mirror.sh
7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback
8. Post-merge source archiving — queue/ → archive/{domain}/ after merge

Pipeline fixes:
- merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea)
- merge_failures diagnostic counter
- Startup recovery preserves eval_attempts (was incorrectly resetting to 0)
- No-diff PRs auto-closed by eval (root cause of 17 zombie PRs)
- GC threshold aligned with substantive fixer budget (was 2, now 4)
- Conflict retry with 3-attempt budget + permanent conflict handler
- Local ff-merge fallback for Forgejo 405 errors

Telegram bot:
- KB retrieval: 3-layer (entity resolution → claim search → agent context)
- Reply-to-bot handler (context.bot.id check)
- Tag regex: @teleo|@futairdbot
- Prompt rewrite for natural analyst voice
- Market data API integration (Ben's token price endpoint)
- Conversation windows (5-message unanswered counter, per-user-per-chat)
- Conversation history in prompt (last 5 exchanges)
- Worktree file lock for archive writes

Infrastructure:
- worktree_lock.py — file-based lock (flock) for main worktree coordination
- backfill-sources.py — source DB registration for Argus funnel
- batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard
- sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight
- Argus dashboard — conflicts + reviewing in backlog, queue count in funnel
- Enrichment-inside-frontmatter bug fix (regex anchor, not --- split)

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-20 20:17:27 +00:00
85b86a918a ganymede: extract lib/llm.py from evaluate.py (Phase 3c)
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- What: LLM transport (OpenRouter, Claude CLI), prompt templates
  (triage/domain/Leo), and review runner functions moved to lib/llm.py.
  evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo
  posting, rate limit backoff, and evaluate_cycle.
- Why: evaluate.py was 734 lines mixing orchestration with LLM concerns.
  Now 455 lines orchestration + 250 lines LLM transport. Each module has
  a single responsibility.
- Connections: completes Phase 3 structural refactor (forgejo.py + domains.py
  + llm.py). teleo-pipeline.py updated to import kill_active_subprocesses
  from lib.llm.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 15:40:18 +00:00
a7251d7529 ganymede: add dev infrastructure — pyproject.toml, CI, deploy script
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
Phase 2 of pipeline refactoring:

- pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest,
  pytest-asyncio, ruff). Ruff configured with sane defaults + ignore
  rules for existing code patterns (implicit Optional, timezone.utc).
- .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff
  lint, ruff format, pytest on every PR and push to main.
- deploy.sh: Pull + venv update + syntax check + optional restart.
  Replaces ad-hoc scp workflow.
- tests/conftest.py: Shared fixture for in-memory SQLite with full
  schema. Ready for Phase 4 test suite.
- .gitignore: Added venv, pytest cache, coverage, build artifacts.
- Ruff auto-fixes: import sorting, unused imports removed across all
  modules. All files pass ruff check + ruff format.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 14:24:27 +00:00
f166db4f62 ganymede: fix 4 critical bugs before pipeline restart
- Fix #12: domain_review undefined on resume path — initialize to None,
  guard _parse_issues() call. Prevents NameError on PRs resuming after
  partial eval (76 PRs in this state right now).
- Fix #11: concurrent eval workers can duplicate reviews — add atomic
  UPDATE SET status='reviewing' WHERE status='open' at top of
  evaluate_pr(). Check rowcount, skip if already claimed.
- Fix #8: subprocess tracking for graceful shutdown — _active_subprocesses
  set in evaluate module, tracked in _claude_cli_call, exposed via
  kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py.
- Fix health.py divide-by-zero — guard all metabolic metric reads against
  None from NULLIF/empty result set. Prevents TypeError on /health when
  no PRs have been evaluated in 24h.

Also includes Leo's existing hot-fixes:
- Rate limit detection checks stdout regardless of exit code
- 15-minute cycle-level backoff on rate limit

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 14:13:25 +00:00
799249d470 Initial commit: Pipeline v2 daemon + infrastructure docs
- teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge)
- lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules
- INFRASTRUCTURE.md: comprehensive deep-dive for onboarding
- teleo-pipeline.service: systemd unit file

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-12 14:11:18 +00:00