cde92d3db1
fix: wrap breaker calls in stage_loop to prevent permanent task death
...
CI / lint-and-test (push) Waiting to run
A transient DB lock in breaker.record_failure() inside an except handler
killed the asyncio coroutine permanently — snapshot_cycle died Apr 18 and
never recovered. All three breaker call sites now have their own try/except.
Also includes HTML injection fix for github_feedback review_text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:37:28 +01:00
681afad506
Consolidate pipeline code from teleo-codex + VPS into single repo
...
CI / lint-and-test (push) Has been cancelled
Sources merged:
- teleo-codex/ops/pipeline-v2/ (11 newer lib files, 5 new lib modules)
- teleo-codex/ops/ (agent-state, diagnostics expansion, systemd units, ops scripts)
- VPS /opt/teleo-eval/telegram/ (10 new bot files, agent configs)
- VPS /opt/teleo-eval/pipeline/ops/ (vector-gc, backfill-descriptions)
- VPS /opt/teleo-eval/sync-mirror.sh (Bug 2 + Step 2.5 fixes)
Non-trivial merges:
- connect.py: kept codex threshold (0.65) + added infra domain parameter
- watchdog.py: kept infra version (stale_pr integration, superset of codex)
- deploy.sh: codex rsync version (interim, until VPS git clone migration)
- diagnostics/app.py: codex decomposed dashboard (14 new route modules)
81 files changed, +17105/-200 lines
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:52:26 +01:00
d79ff60689
epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features
...
Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio):
1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter
2. Ghost PR detection — ls-remote branch check in reconciliation, network guard
3. Source status contract — directory IS status, no code change needed
4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check)
5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews
6. Mirror pre-flight permissions — chown check in sync-mirror.sh
7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback
8. Post-merge source archiving — queue/ → archive/{domain}/ after merge
Pipeline fixes:
- merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea)
- merge_failures diagnostic counter
- Startup recovery preserves eval_attempts (was incorrectly resetting to 0)
- No-diff PRs auto-closed by eval (root cause of 17 zombie PRs)
- GC threshold aligned with substantive fixer budget (was 2, now 4)
- Conflict retry with 3-attempt budget + permanent conflict handler
- Local ff-merge fallback for Forgejo 405 errors
Telegram bot:
- KB retrieval: 3-layer (entity resolution → claim search → agent context)
- Reply-to-bot handler (context.bot.id check)
- Tag regex: @teleo|@futairdbot
- Prompt rewrite for natural analyst voice
- Market data API integration (Ben's token price endpoint)
- Conversation windows (5-message unanswered counter, per-user-per-chat)
- Conversation history in prompt (last 5 exchanges)
- Worktree file lock for archive writes
Infrastructure:
- worktree_lock.py — file-based lock (flock) for main worktree coordination
- backfill-sources.py — source DB registration for Argus funnel
- batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard
- sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight
- Argus dashboard — conflicts + reviewing in backlog, queue count in funnel
- Enrichment-inside-frontmatter bug fix (regex anchor, not --- split)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-20 20:17:27 +00:00
85b86a918a
ganymede: extract lib/llm.py from evaluate.py (Phase 3c)
...
CI / lint-and-test (pull_request) Has been cancelled
- What: LLM transport (OpenRouter, Claude CLI), prompt templates
(triage/domain/Leo), and review runner functions moved to lib/llm.py.
evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo
posting, rate limit backoff, and evaluate_cycle.
- Why: evaluate.py was 734 lines mixing orchestration with LLM concerns.
Now 455 lines orchestration + 250 lines LLM transport. Each module has
a single responsibility.
- Connections: completes Phase 3 structural refactor (forgejo.py + domains.py
+ llm.py). teleo-pipeline.py updated to import kill_active_subprocesses
from lib.llm.
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 15:40:18 +00:00
a7251d7529
ganymede: add dev infrastructure — pyproject.toml, CI, deploy script
...
CI / lint-and-test (pull_request) Has been cancelled
Phase 2 of pipeline refactoring:
- pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest,
pytest-asyncio, ruff). Ruff configured with sane defaults + ignore
rules for existing code patterns (implicit Optional, timezone.utc).
- .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff
lint, ruff format, pytest on every PR and push to main.
- deploy.sh: Pull + venv update + syntax check + optional restart.
Replaces ad-hoc scp workflow.
- tests/conftest.py: Shared fixture for in-memory SQLite with full
schema. Ready for Phase 4 test suite.
- .gitignore: Added venv, pytest cache, coverage, build artifacts.
- Ruff auto-fixes: import sorting, unused imports removed across all
modules. All files pass ruff check + ruff format.
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 14:24:27 +00:00
f166db4f62
ganymede: fix 4 critical bugs before pipeline restart
...
- Fix #12 : domain_review undefined on resume path — initialize to None,
guard _parse_issues() call. Prevents NameError on PRs resuming after
partial eval (76 PRs in this state right now).
- Fix #11 : concurrent eval workers can duplicate reviews — add atomic
UPDATE SET status='reviewing' WHERE status='open' at top of
evaluate_pr(). Check rowcount, skip if already claimed.
- Fix #8 : subprocess tracking for graceful shutdown — _active_subprocesses
set in evaluate module, tracked in _claude_cli_call, exposed via
kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py.
- Fix health.py divide-by-zero — guard all metabolic metric reads against
None from NULLIF/empty result set. Prevents TypeError on /health when
no PRs have been evaluated in 24h.
Also includes Leo's existing hot-fixes:
- Rate limit detection checks stdout regardless of exit code
- 15-minute cycle-level backoff on rate limit
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 14:13:25 +00:00
799249d470
Initial commit: Pipeline v2 daemon + infrastructure docs
...
- teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge)
- lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules
- INFRASTRUCTURE.md: comprehensive deep-dive for onboarding
- teleo-pipeline.service: systemd unit file
Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-12 14:11:18 +00:00