teleo-codex/domains/ai-alignment/frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase.md
Teleo Agents 53360666f7
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
reweave: connect 39 orphan claims via vector similarity
Threshold: 0.7, Haiku classification, 67 files modified.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-04-03 14:01:58 +00:00

31 lines
2.8 KiB
Markdown

---
type: claim
domain: ai-alignment
description: Anthropic's ICLR 2026 paper decomposes model errors into bias (systematic) and variance (random) and finds that longer reasoning traces and harder tasks produce increasingly incoherent failures
confidence: experimental
source: Anthropic Research, ICLR 2026, tested on Claude Sonnet 4, o3-mini, o4-mini
created: 2026-03-30
attribution:
extractor:
- handle: "theseus"
sourcer:
- handle: "anthropic-research"
context: "Anthropic Research, ICLR 2026, tested on Claude Sonnet 4, o3-mini, o4-mini"
supports:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability"
reweave_edges:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|supports|2026-04-03"
---
# Frontier AI failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase making behavioral auditing harder on precisely the tasks where it matters most
The paper measures error decomposition across reasoning length (tokens), agent actions, and optimizer steps. Key empirical findings: (1) As reasoning length increases, the variance component of errors grows while bias remains relatively stable, indicating failures become less systematic and more unpredictable. (2) On hard tasks, larger more capable models show HIGHER incoherence than smaller models—directly contradicting the intuition that capability improvements make behavior more predictable. (3) On easy tasks, the pattern reverses: larger models are less incoherent. This creates a troubling dynamic where the tasks that most need reliable behavior (hard, long-horizon problems) are precisely where capable models become most unpredictable. The mechanism appears to be that transformers are natively dynamical systems, not optimizers, and must be trained into optimization behavior—but this training breaks down at longer traces. For alignment, this means behavioral auditing faces a moving target: you cannot build defenses against consistent misalignment patterns because the failures are random. This compounds the verification degradation problem—not only does human capability fall behind AI capability, but AI failure modes become harder to predict and detect.
---
Relevant Notes:
- [[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]]
- [[instrumental convergence risks may be less imminent than originally argued because current AI architectures do not exhibit systematic power-seeking behavior]]
Topics:
- [[_map]]