teleo/teleo-infrastructure

Author	SHA1	Message	Date
m3taversal	ba234ec4b3	fix(reaper): apply Ganymede review — dual-PATCH drift, breaker isolation, env config Followup to `f97dd15`. Four fixes from review: MUST-FIX #1 — Forgejo double-PATCH drift reaper closes PR via forgejo_api PATCH at line 689, then close_pr() at line 700 issued a second PATCH (default close_on_forgejo=True). On transient failure of the second PATCH, close_pr returns False without updating the DB → status='open' even though Forgejo is closed. Pass close_on_forgejo=False so DB close is unconditional after the explicit Forgejo PATCH succeeds. MUST-FIX #2 — reaper exception trips fix breaker Unhandled exception in verdict_deadlock_reaper_cycle propagated to stage_loop, recording fix-stage failures. After 5 reaper failures the fix breaker would open and block mechanical+substantive for 15 min. Wrap reaper call in try/except in fix_cycle (same exception-isolation pattern as ingest_cycle's extract_cycle wrapper). Defense-in-depth must never block primary paths. WARNING #1 — throttle SQL full-scan audit_log only has idx_audit_stage. Filtering on event alone caused full-table scans every 60s. Added stage='reaper' so the planner uses the existing index — reaper writes audit rows under stage='reaper' already so the filter is correct. WARNING #2 — REAPER_DRY_RUN as code constant Flipping dry-run → live required edit + commit + push + deploy + restart. Moved REAPER_DRY_RUN, REAPER_DEADLOCK_AGE_HOURS, REAPER_INTERVAL_SECONDS, REAPER_MAX_PER_RUN to lib/config.py with os.environ.get() overrides. Operator now flips via systemctl edit teleo-pipeline.service (Environment=REAPER_DRY_RUN=false) + restart. Defaults remain safe: dry-run, 24h age, hourly throttle, 50/run cap. NIT — dry-run counter naming Renamed local `closed` counter in dry-run path to `would_close` so the heartbeat audit ("X closed, Y would-close") and journal log are unambiguous. Function still returns closed + would_close so callers see total work done. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:43:53 -04:00
m3taversal	992b4ee36f	feat(merge): _merge_no_ff_external for gh-pr-* branches (Phase 2) External GitHub fork PRs need their contributor commit SHA in main's history for GitHub's "merged" badge to fire. Cherry-pick rewrites the SHA, breaking that detection. New _merge_no_ff_external function preserves the SHA via a true merge commit. Mechanics (mirrors _cherry_pick_onto_main shape): 1. Fetch origin/main + origin/{branch} 2. Detached worktree at origin/main, git merge --no-ff origin/{branch} with verbose message: "Merge external GitHub PR #{N}: {branch_slug}" 3. Force-push merge commit M as origin/{branch}, replacing branch tip 4. Dispatch's existing ff-push origin/{branch} → main propagates M to main M has parents [main_sha, contributor_sha]. M is a fast-forward descendant of main_sha (first-parent chain), so the ff-push to main is valid without --force. Contributor SHA reachable from main → GitHub recognizes merged. Conflict handling: same auto-resolve as cherry-pick — entity-only conflicts take main's version (--ours = current worktree HEAD = main), other conflicts abort with detail. Backout: config.EXTERNAL_PR_NO_FF_MERGE = True (default). Set False to fall back to cherry-pick if no-ff destabilizes throughput one week pre-Accelerate. Branch dispatch in _merge_domain_queue: - reweave/* → _merge_reweave_pr (existing) - gh-pr-N/* AND config.EXTERNAL_PR_NO_FF_MERGE → _merge_no_ff_external (new) - everything else → _cherry_pick_onto_main (existing default) Verified end-to-end in scratch repo: - merge commit M has [main_sha, contributor_sha] as parents - contributor SHA is ancestor of M - after ff-push, contributor SHA is in main's history (GitHub badge fires) - regex parses 8 cases correctly (real fork PR + edge cases reject cleanly) Architecture per Ship Msg 3 / doc v3 (`537cfd5` on epimetheus/external-merge-flow-design). Phase 1 (sync-mirror self-heal) deployed yesterday. Phase 3 (FwazB PR #90 cleanup) queued behind this deploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 15:18:37 +01:00
m3taversal	97b590acd6	fix: close cooldown-dependence gaps in extract.py (Ganymede review) Some checks are pending CI / lint-and-test (push) Waiting to run Details Three targeted fixes from Ganymede's review of commit `469cb7f`: BUG #1 — Success path now updates sources.status='extracting' before PR creation, so queue scan's DB-authoritative filter catches sources between PR creation and merge. Previously the cooldown gate was load-bearing for this window, not belt-and-suspenders as claimed. BUG #2 — Second null-result path (line 573, triggered when enrichments existed but all targets were missing in worktree) now updates DB. Without this, that path created no PR, no DB mark, and would have re-entered the runaway loop 4h later when the cooldown window expired. NIT #6 — 4h cooldown moved to config.EXTRACTION_COOLDOWN_HOURS. Tunable without code change. Log format now shows the configured hours. Also backfilled 59 pre-existing zombie queue-path rows where the file was already archived but DB status said 'unprocessed' — these would have leaked past the DB filter once the 4h cooldown expired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:33:10 +01:00
m3taversal	9c0be78620	fix: align CI role weights with contribution-architecture.md config.py had extractor-heavy weights (0.40) from initial bootstrap. Correct weights per approved architecture: challenger 0.35, synthesizer 0.25, reviewer 0.20, sourcer 0.15, extractor 0.05. backfill-ci.py already had correct weights; this fixes the live computation in health.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 10:37:47 +01:00
m3taversal	681afad506	Consolidate pipeline code from teleo-codex + VPS into single repo Some checks failed CI / lint-and-test (push) Has been cancelled Details Sources merged: - teleo-codex/ops/pipeline-v2/ (11 newer lib files, 5 new lib modules) - teleo-codex/ops/ (agent-state, diagnostics expansion, systemd units, ops scripts) - VPS /opt/teleo-eval/telegram/ (10 new bot files, agent configs) - VPS /opt/teleo-eval/pipeline/ops/ (vector-gc, backfill-descriptions) - VPS /opt/teleo-eval/sync-mirror.sh (Bug 2 + Step 2.5 fixes) Non-trivial merges: - connect.py: kept codex threshold (0.65) + added infra domain parameter - watchdog.py: kept infra version (stale_pr integration, superset of codex) - deploy.sh: codex rsync version (interim, until VPS git clone migration) - diagnostics/app.py: codex decomposed dashboard (14 new route modules) 81 files changed, +17105/-200 lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:52:26 +01:00
m3taversal	d79ff60689	epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio): 1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter 2. Ghost PR detection — ls-remote branch check in reconciliation, network guard 3. Source status contract — directory IS status, no code change needed 4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check) 5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews 6. Mirror pre-flight permissions — chown check in sync-mirror.sh 7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback 8. Post-merge source archiving — queue/ → archive/{domain}/ after merge Pipeline fixes: - merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea) - merge_failures diagnostic counter - Startup recovery preserves eval_attempts (was incorrectly resetting to 0) - No-diff PRs auto-closed by eval (root cause of 17 zombie PRs) - GC threshold aligned with substantive fixer budget (was 2, now 4) - Conflict retry with 3-attempt budget + permanent conflict handler - Local ff-merge fallback for Forgejo 405 errors Telegram bot: - KB retrieval: 3-layer (entity resolution → claim search → agent context) - Reply-to-bot handler (context.bot.id check) - Tag regex: @teleo\|@futairdbot - Prompt rewrite for natural analyst voice - Market data API integration (Ben's token price endpoint) - Conversation windows (5-message unanswered counter, per-user-per-chat) - Conversation history in prompt (last 5 exchanges) - Worktree file lock for archive writes Infrastructure: - worktree_lock.py — file-based lock (flock) for main worktree coordination - backfill-sources.py — source DB registration for Argus funnel - batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard - sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight - Argus dashboard — conflicts + reviewing in backlog, queue count in funnel - Enrichment-inside-frontmatter bug fix (regex anchor, not --- split) Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-20 20:17:27 +00:00
m3taversal	090b1411fd	epimetheus: source archive restructure — inbox/queue + inbox/archive/{domain} + inbox/null-result - config.py: added INBOX_QUEUE, INBOX_NULL_RESULT constants - evaluate.py: skip patterns + LIGHT tier cover all inbox/ subdirs - llm.py: eval prompts reference inbox/ generically - telegram/bot.py: archives to inbox/queue/ - telegram/teleo-telegram.service: ReadWritePaths expanded - research-prompt-v2.md: paths updated to inbox/queue/ - research-prompt-leo-synthesis.md: paths updated - migrate-source-archive.py: one-time migration script Reviewed by: Ganymede, Rhea, Leo (all approved) Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-18 11:50:04 +00:00
m3taversal	ffa718e834	ganymede: implement tier logic — LIGHT skip, claim-shape detector, pre-merge promotion - Claim-shape detector: if YAML has type: claim, force STANDARD minimum (Theseus) - Random pre-merge promotion: 15% of LIGHT → STANDARD before eval (Rio) - LIGHT_SKIP_LLM config flag: skip domain+Leo review for LIGHT (Rhea: env var rollback) - Updated both_approve: domain_verdict=skipped is valid for LIGHT auto-approve - Cost recording: only charge for reviews that actually ran - SAMPLE_AUDIT_RATE bumped 0.10 → 0.15, audit model = Opus (Leo: different family from Haiku) Multi-agent design review: Rio (gaming vectors, model diversity), Theseus (correlated blindspots, claim-shape guard), Rhea (shadow mode, config flag, deployment), Leo (approval). Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 18:05:43 +00:00
m3taversal	e7c902bac8	leo: implement retry budget — stop infinite eval loops Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON) columns to prs table. Retry budget logic (Ganymede-approved design): - Increment eval_attempts on each evaluate_pr() call - Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human) - Attempt 1: normal — back to open, wait for fix - Attempt 2: classify issues as mechanical/substantive - Mechanical only (schema, wiki links, dedup): keep open for one more try - Substantive (factual, confidence, scope, title): close PR, requeue source - Issue tags parsed from reviewer comments, stored in eval_issues column - SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset - Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike - Cost recording updated: domain review → OpenRouter, Leo → tier-dependent Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state. Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:14:12 +00:00
m3taversal	c0a6adf9ed	leo: model diversity + calibrated review prompts - Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter), Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots. - Opus reserved for DEEP eval only — protects rate limit for overnight research. - Review prompts calibrated: require per-criterion evidence, blocking-vs-observation verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate. - OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid spurious 15-min Opus backoff. - merge.py: pre-check PR state before merge API call (prevents 405 on re-merge). Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-13 17:10:30 +00:00
m3taversal	a7251d7529	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details Phase 2 of pipeline refactoring: - pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest, pytest-asyncio, ruff). Ruff configured with sane defaults + ignore rules for existing code patterns (implicit Optional, timezone.utc). - .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff lint, ruff format, pytest on every PR and push to main. - deploy.sh: Pull + venv update + syntax check + optional restart. Replaces ad-hoc scp workflow. - tests/conftest.py: Shared fixture for in-memory SQLite with full schema. Ready for Phase 4 test suite. - .gitignore: Added venv, pytest cache, coverage, build artifacts. - Ruff auto-fixes: import sorting, unused imports removed across all modules. All files pass ruff check + ruff format. Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>	2026-03-13 14:24:27 +00:00
m3taversal	799249d470	Initial commit: Pipeline v2 daemon + infrastructure docs - teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge) - lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules - INFRASTRUCTURE.md: comprehensive deep-dive for onboarding - teleo-pipeline.service: systemd unit file Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>	2026-03-12 14:11:18 +00:00

12 commits