teleo-codex/domains/ai-alignment/AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md
m3taversal a86e804c87 theseus: extract 4 claims from Knuth's Claude's Cycles paper
- What: 4 new claims about AI capability evidence from Knuth's Feb 2026 paper
  on Hamiltonian cycle decomposition solved by Claude Opus 4.6 + Filip Stappers
- Claims:
  1. Human-AI collaboration succeeds through three-role specialization (explore/coach/verify)
  2. Multi-model collaboration outperforms single models on hard problems (even case)
  3. AI capability and reliability are independent dimensions (solved problem but degraded)
  4. Formal verification provides scalable oversight that doesn't degrade with capability gaps
- Source: archived at inbox/archive/2026-02-28-knuth-claudes-cycles.md (now processed)
- _map.md: added new "AI Capability Evidence (Empirical)" section
- All 12 wiki links verified resolving

Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>
2026-03-07 19:52:15 +00:00

3.7 KiB

type domain description confidence source created
claim ai-alignment Knuth's Claude's Cycles documents peak mathematical capability co-occurring with reliability degradation in the same model during the same session, challenging the assumption that capability implies dependability experimental Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6) 2026-03-07

AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session

Knuth reports that Claude Opus 4.6, in collaboration with Stappers, solved an open combinatorial problem that had resisted solution for decades — finding a general construction for decomposing directed graphs with m^3 vertices into three Hamiltonian cycles. This represents frontier mathematical capability. Yet in the same series of explorations, Knuth notes Claude "was not even able to write and run explore programs correctly anymore, very weird" — basic code execution degrading even as high-level mathematical insight remained productive.

Additional reliability failures documented:

  • Stappers had to remind Claude repeatedly to document progress carefully
  • Claude required continuous human steering — it could not autonomously manage a multi-exploration research program
  • Extended sessions produced degradation: the even case attempts failed not from lack of capability but from execution reliability declining over time

This decoupling of capability from reliability has direct implications for alignment:

Capability without reliability is more dangerous than capability without capability. A system that can solve frontier problems but cannot maintain consistent execution is unpredictable in a way that purely incapable systems are not. The failure mode is not "it can't do the task" but "it sometimes does the task brilliantly and sometimes fails at prerequisites." This makes behavioral testing unreliable as a safety measure — a system that passes capability benchmarks may still fail at operational consistency.

This pattern is distinct from an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak. Strategic deception is intentional inconsistency; what Knuth documents is unintentional inconsistency — a system that degrades without choosing to. The alignment implication is that even non-deceptive AI requires monitoring for reliability, not just alignment.

The finding also strengthens the case for safe AI development requires building alignment mechanisms before scaling capability: if capability can outrun reliability, then deploying a capable but unreliable system in high-stakes contexts (infrastructure, military, medical) creates fragility that alignment mechanisms must address independently of capability evaluation.


Relevant Notes:

Topics: