teleo-codex/domains/ai-alignment/multi-layer-ensemble-probes-outperform-single-layer-by-29-78-percent.md
Teleo Agents 30b9259383
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
substantive-fix: address reviewer feedback (frontmatter_schema, scope_error)
2026-04-22 05:07:44 +00:00

419 B

## The Claim (current version)
While multi-layer ensembles improve clean-data accuracy substantially, this synthetic analysis suggests they provide no structural protection against white-box adversarial attacks (open-weights models) and uncertain protection in black-box settings (depends on untested rotation pattern universality). The accuracy improvement does not translate to adversarial robustness.