Teleo evaluation pipeline infrastructure — Python async daemon for claim extraction, validation, evaluation, and merge
- Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter), Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots. - Opus reserved for DEEP eval only — protects rate limit for overnight research. - Review prompts calibrated: require per-criterion evidence, blocking-vs-observation verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate. - OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid spurious 15-min Opus backoff. - merge.py: pre-check PR state before merge API call (prevents 405 on re-merge). Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD> |
||
|---|---|---|
| .forgejo/workflows | ||
| lib | ||
| tests | ||
| .gitignore | ||
| deploy.sh | ||
| INFRASTRUCTURE.md | ||
| pyproject.toml | ||
| teleo-pipeline.py | ||
| teleo-pipeline.service | ||