--- type: claim domain: ai-alignment description: Larger more capable models show MORE random unpredictable failures on hard tasks than smaller models, suggesting capability gains worsen alignment auditability in the relevant regime confidence: experimental source: Anthropic Research, ICLR 2026, empirical measurements across model scales created: 2026-03-30 attribution: extractor: - handle: "theseus" sourcer: - handle: "anthropic-research" context: "Anthropic Research, ICLR 2026, empirical measurements across model scales" --- # Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability The counterintuitive finding: as models scale up and overall error rates drop, the COMPOSITION of remaining errors shifts toward higher variance (incoherence) on difficult tasks. This means that the marginal errors that persist in larger models are less systematic and harder to predict than the errors in smaller models. The mechanism appears to be that harder tasks require longer reasoning traces, and longer traces amplify the dynamical-system nature of transformers rather than their optimizer-like behavior. This has direct implications for alignment strategy: you cannot assume that scaling to more capable models will make behavioral auditing easier or more reliable. In fact, on the hardest tasks—where alignment matters most—scaling may make auditing HARDER because failures become less patterned. This challenges the implicit assumption in much alignment work that capability improvements and alignment improvements move together. The data suggests they may diverge: more capable models may be simultaneously better at solving problems AND worse at failing predictably. --- Relevant Notes: - [[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]] - scalable oversight degrades rapidly as capability gaps grow Topics: - [[_map]]