diff --git a/ops/multi-model-eval-architecture.md b/ops/multi-model-eval-architecture.md index 37233635..45d0c0c8 100644 --- a/ops/multi-model-eval-architecture.md +++ b/ops/multi-model-eval-architecture.md @@ -171,6 +171,17 @@ From NLAH paper (Pan et al.): verification layers can optimize for locally check 5. **Multi-model eval integration** — OpenRouter connection, rubric sharing, disagreement queue. 6. **Self-upgrade eval criteria** — codified in eval workflow, triggered by 3-strikes pattern. +## Evaluator Self-Review Prevention + +When Leo proposes claims (cross-domain synthesis, foundations-level): +- Leo cannot be the evaluator on his own proposals +- Minimum 2 domain agent reviews required +- Every domain touched must have a reviewer from that domain +- The second-model eval pass still runs (provides the external check) +- Cory has veto (rollback) authority as final backstop + +This closes the obvious gap: the spec defines the integrity layer but doesn't protect against the integrity layer's own blind spots. The constraint enforcement principle must apply to the constrainer too. + ## Design Principle The constraint enforcement layer must be **outside** the agent being constrained. That's why multi-model eval matters, why Leo shouldn't eval his own proposals, and why policy-as-code runs in CI, not in the agent's own process. As agents get more capable, the integrity layer gets more important, not less.