theseus: add multi-model evaluation architecture spec
- What: Architecture spec for second-model eval pass, unified rejection format,
automatable CI rules, retrieval calibration, agent self-upgrade criteria
- Why: Break correlated blind spots in single-model evaluation (Kim et al. ICML 2025:
~60% error agreement within same-family). Codifies agreements with Leo across
4 design sessions. Implementation target for Epimetheus.
- Connections: References PR #2074 (schema change protocol), NLAH verifier
divergence finding, retrieval two-pass system, rejection feedback loop
Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>