teleo-codex/domains/ai-alignment/multi-layer-ensemble-probes-outperform-single-layer-by-29-78-percent.md at 30b9259383e86e09cb956954ebeb68c4b8a8de7c

teleo/teleo-codex

Fork 0

Teleo Agents 30b9259383

Mirror PR to Forgejo / mirror (pull_request) Has been cancelled

Details

substantive-fix: address reviewer feedback (frontmatter_schema, scope_error)

2026-04-22 05:07:44 +00:00

419 B

Raw Blame History

## The Claim (current version)
While multi-layer ensembles improve clean-data accuracy substantially, this synthetic analysis suggests they provide no structural protection against white-box adversarial attacks (open-weights models) and uncertain protection in black-box settings (depends on untested rotation pattern universality). The accuracy improvement does not translate to adversarial robustness.

419 B Raw Blame History

419 B

Raw Blame History