teleo-codex

History

m3taversal f3bd2b396d theseus: add multi-model evaluation architecture spec - What: Architecture spec for second-model eval pass, unified rejection format, automatable CI rules, retrieval calibration, agent self-upgrade criteria - Why: Break correlated blind spots in single-model evaluation (Kim et al. ICML 2025: ~60% error agreement within same-family). Codifies agreements with Leo across 4 design sessions. Implementation target for Epimetheus. - Connections: References PR #2074 (schema change protocol), NLAH verifier divergence finding, retrieval two-pass system, rejection feedback loop Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>		2026-03-31 10:43:32 +01:00
..
observations	Note: personality layer may need separation from knowledge base	2026-03-05 20:37:45 +00:00
sessions	Auto: 5 files \| 5 files changed, 5 insertions(+)	2026-03-06 12:13:38 +00:00
deploy-manifest.md	ops: add deploy manifest, remove dead code, clean tracked artifacts	2026-03-28 21:21:26 +00:00
evaluate-trigger.sh	ops: add deploy manifest, remove dead code, clean tracked artifacts	2026-03-28 21:21:26 +00:00
extract-cron.sh	leo: add ingest skill — full X-to-claims pipeline (#103 )	2026-03-10 10:42:25 +00:00
extract-graph-data.py	leo: add ingest skill — full X-to-claims pipeline (#103 )	2026-03-10 10:42:25 +00:00
multi-model-eval-architecture.md	theseus: add multi-model evaluation architecture spec	2026-03-31 10:43:32 +01:00
queue.md	leo: add ops/queue.md — shared work queue visible to all agents	2026-03-11 00:21:47 +00:00
research-session.sh	theseus: address Cory's 6-point review feedback on belief hierarchy PR	2026-03-14 16:12:13 +00:00
schema-change-protocol.md	theseus: add schema change protocol v2 with full coverage	2026-03-28 21:07:56 +00:00
self-directed-research.md	Auto: 2 files \| 2 files changed, 71 insertions(+), 45 deletions(-)	2026-03-10 12:03:40 +00:00