Teleo evaluation pipeline infrastructure — Python async daemon for claim extraction, validation, evaluation, and merge
Find a file
m3taversal c0a6adf9ed leo: model diversity + calibrated review prompts
- Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter),
  Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots.
- Opus reserved for DEEP eval only — protects rate limit for overnight research.
- Review prompts calibrated: require per-criterion evidence, blocking-vs-observation
  verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate.
- OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid
  spurious 15-min Opus backoff.
- merge.py: pre-check PR state before merge API call (prevents 405 on re-merge).

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:10:30 +00:00
.forgejo/workflows ganymede: add dev infrastructure — pyproject.toml, CI, deploy script 2026-03-13 14:24:27 +00:00
lib leo: model diversity + calibrated review prompts 2026-03-13 17:10:30 +00:00
tests ganymede: add dev infrastructure — pyproject.toml, CI, deploy script 2026-03-13 14:24:27 +00:00
.gitignore ganymede: add dev infrastructure — pyproject.toml, CI, deploy script 2026-03-13 14:24:27 +00:00
deploy.sh ganymede: add dev infrastructure — pyproject.toml, CI, deploy script 2026-03-13 14:24:27 +00:00
INFRASTRUCTURE.md Initial commit: Pipeline v2 daemon + infrastructure docs 2026-03-12 14:11:18 +00:00
pyproject.toml ganymede: add dev infrastructure — pyproject.toml, CI, deploy script 2026-03-13 14:24:27 +00:00
teleo-pipeline.py ganymede: extract lib/llm.py from evaluate.py (Phase 3c) 2026-03-13 15:40:18 +00:00
teleo-pipeline.service Initial commit: Pipeline v2 daemon + infrastructure docs 2026-03-12 14:11:18 +00:00