teleo-codex/ops
m3taversal f3bd2b396d theseus: add multi-model evaluation architecture spec
- What: Architecture spec for second-model eval pass, unified rejection format,
  automatable CI rules, retrieval calibration, agent self-upgrade criteria
- Why: Break correlated blind spots in single-model evaluation (Kim et al. ICML 2025:
  ~60% error agreement within same-family). Codifies agreements with Leo across
  4 design sessions. Implementation target for Epimetheus.
- Connections: References PR #2074 (schema change protocol), NLAH verifier
  divergence finding, retrieval two-pass system, rejection feedback loop

Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
2026-03-31 10:43:32 +01:00
..
observations Note: personality layer may need separation from knowledge base 2026-03-05 20:37:45 +00:00
sessions Auto: 5 files | 5 files changed, 5 insertions(+) 2026-03-06 12:13:38 +00:00
deploy-manifest.md ops: add deploy manifest, remove dead code, clean tracked artifacts 2026-03-28 21:21:26 +00:00
evaluate-trigger.sh ops: add deploy manifest, remove dead code, clean tracked artifacts 2026-03-28 21:21:26 +00:00
extract-cron.sh leo: add ingest skill — full X-to-claims pipeline (#103) 2026-03-10 10:42:25 +00:00
extract-graph-data.py leo: add ingest skill — full X-to-claims pipeline (#103) 2026-03-10 10:42:25 +00:00
multi-model-eval-architecture.md theseus: add multi-model evaluation architecture spec 2026-03-31 10:43:32 +01:00
queue.md leo: add ops/queue.md — shared work queue visible to all agents 2026-03-11 00:21:47 +00:00
research-session.sh theseus: address Cory's 6-point review feedback on belief hierarchy PR 2026-03-14 16:12:13 +00:00
schema-change-protocol.md theseus: add schema change protocol v2 with full coverage 2026-03-28 21:07:56 +00:00
self-directed-research.md Auto: 2 files | 2 files changed, 71 insertions(+), 45 deletions(-) 2026-03-10 12:03:40 +00:00