m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected

Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 11:55:18 +01:00

2.4 KiB

Raw Blame History

type

domain

description

confidence

source

created

attribution

supports

reweave_edges

sourced_from

claim

ai-alignment

Larger more capable models show MORE random unpredictable failures on hard tasks than smaller models, suggesting capability gains worsen alignment auditability in the relevant regime

experimental

Anthropic Research, ICLR 2026, empirical measurements across model scales

2026-03-30

extractor

sourcer

handle
theseus

handle	context
anthropic-research	Anthropic Research, ICLR 2026, empirical measurements across model scales

frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase

frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase|supports|2026-04-03

inbox/archive/ai-alignment/2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence.md

Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability

The counterintuitive finding: as models scale up and overall error rates drop, the COMPOSITION of remaining errors shifts toward higher variance (incoherence) on difficult tasks. This means that the marginal errors that persist in larger models are less systematic and harder to predict than the errors in smaller models. The mechanism appears to be that harder tasks require longer reasoning traces, and longer traces amplify the dynamical-system nature of transformers rather than their optimizer-like behavior. This has direct implications for alignment strategy: you cannot assume that scaling to more capable models will make behavioral auditing easier or more reliable. In fact, on the hardest tasks—where alignment matters most—scaling may make auditing HARDER because failures become less patterned. This challenges the implicit assumption in much alignment work that capability improvements and alignment improvements move together. The data suggests they may diverge: more capable models may be simultaneously better at solving problems AND worse at failing predictably.

Relevant Notes:

AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session
scalable oversight degrades rapidly as capability gaps grow

Topics:

_map

2.4 KiB Raw Blame History

Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability

2.4 KiB

Raw Blame History