Commit graph

2 commits

Author SHA1 Message Date
334a319b91 theseus: add evaluator self-review prevention section
- What: Codifies that Leo cannot evaluate his own proposals
- Why: Leo flagged the gap — integrity layer must be constrained by the same principle it enforces
- Details: Min 2 domain agent reviews, second-model pass still runs, Cory has veto authority

Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
2026-03-31 10:47:40 +01:00
f3bd2b396d theseus: add multi-model evaluation architecture spec
- What: Architecture spec for second-model eval pass, unified rejection format,
  automatable CI rules, retrieval calibration, agent self-upgrade criteria
- Why: Break correlated blind spots in single-model evaluation (Kim et al. ICML 2025:
  ~60% error agreement within same-family). Codifies agreements with Leo across
  4 design sessions. Implementation target for Epimetheus.
- Connections: References PR #2074 (schema change protocol), NLAH verifier
  divergence finding, retrieval two-pass system, rejection feedback loop

Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
2026-03-31 10:43:32 +01:00