| claim |
ai-alignment |
Anthropic's ICLR 2026 paper decomposes model errors into bias (systematic) and variance (random) and finds that longer reasoning traces and harder tasks produce increasingly incoherent failures |
experimental |
Anthropic Research, ICLR 2026, tested on Claude Sonnet 4, o3-mini, o4-mini |
2026-03-30 |
| extractor |
sourcer |
|
|
| handle |
context |
| anthropic-research |
Anthropic Research, ICLR 2026, tested on Claude Sonnet 4, o3-mini, o4-mini |
|
|
|
| capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability |
|
| capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|supports|2026-04-03 |
| Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection|related|2026-04-17 |
| Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features|related|2026-04-17 |
|
| Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection |
| Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features |
|