Commit graph

5 commits

Author SHA1 Message Date
0457c49094 fix: zombie retry loop + cost tracking
Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before
re-extracting. Sources with >=3 closed PRs are skipped (zombie protection).

Cost tracking: openrouter_call() now returns (text, usage) tuple with
prompt_tokens and completion_tokens from the OpenRouter API response.
All callers updated to unpack and pass tokens to costs.record_usage().
Added missing triage cost recording. Fixed batch domain review recording
cost once per batch instead of once per PR.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:29:58 +00:00
090b1411fd epimetheus: source archive restructure — inbox/queue + inbox/archive/{domain} + inbox/null-result
- config.py: added INBOX_QUEUE, INBOX_NULL_RESULT constants
- evaluate.py: skip patterns + LIGHT tier cover all inbox/ subdirs
- llm.py: eval prompts reference inbox/ generically
- telegram/bot.py: archives to inbox/queue/
- telegram/teleo-telegram.service: ReadWritePaths expanded
- research-prompt-v2.md: paths updated to inbox/queue/
- research-prompt-leo-synthesis.md: paths updated
- migrate-source-archive.py: one-time migration script

Reviewed by: Ganymede, Rhea, Leo (all approved)

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
2026-03-18 11:50:04 +00:00
93e6f16144 leo: constrain issue tags — do not invent new tags
Opus was ignoring the valid tag list and generating custom tags like
schema-enrichment-slug-mismatch, which fall through to 'unknown' in
disposition logic. All three prompts (domain, Leo standard, Leo deep)
now explicitly say "do not invent new tags" alongside the valid tag list.

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:27:40 +00:00
c0a6adf9ed leo: model diversity + calibrated review prompts
- Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter),
  Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots.
- Opus reserved for DEEP eval only — protects rate limit for overnight research.
- Review prompts calibrated: require per-criterion evidence, blocking-vs-observation
  verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate.
- OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid
  spurious 15-min Opus backoff.
- merge.py: pre-check PR state before merge API call (prevents 405 on re-merge).

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:10:30 +00:00
85b86a918a ganymede: extract lib/llm.py from evaluate.py (Phase 3c)
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- What: LLM transport (OpenRouter, Claude CLI), prompt templates
  (triage/domain/Leo), and review runner functions moved to lib/llm.py.
  evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo
  posting, rate limit backoff, and evaluate_cycle.
- Why: evaluate.py was 734 lines mixing orchestration with LLM concerns.
  Now 455 lines orchestration + 250 lines LLM transport. Each module has
  a single responsibility.
- Connections: completes Phase 3 structural refactor (forgejo.py + domains.py
  + llm.py). teleo-pipeline.py updated to import kill_active_subprocesses
  from lib.llm.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 15:40:18 +00:00