teleo-codex/domains/ai-alignment
m3taversal a86e804c87 theseus: extract 4 claims from Knuth's Claude's Cycles paper
- What: 4 new claims about AI capability evidence from Knuth's Feb 2026 paper
  on Hamiltonian cycle decomposition solved by Claude Opus 4.6 + Filip Stappers
- Claims:
  1. Human-AI collaboration succeeds through three-role specialization (explore/coach/verify)
  2. Multi-model collaboration outperforms single models on hard problems (even case)
  3. AI capability and reliability are independent dimensions (solved problem but degraded)
  4. Formal verification provides scalable oversight that doesn't degrade with capability gaps
- Source: archived at inbox/archive/2026-02-28-knuth-claudes-cycles.md (now processed)
- _map.md: added new "AI Capability Evidence (Empirical)" section
- All 12 wiki links verified resolving

Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>
2026-03-07 19:52:15 +00:00
..
_map.md theseus: extract 4 claims from Knuth's Claude's Cycles paper 2026-03-07 19:52:15 +00:00
adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
AI alignment is a coordination problem not a technical problem.md leo: foundations audit — 7 moves, 4 deletes, 3 condensations, 10 confidence demotions, 23 type fixes, 1 centaur rewrite 2026-03-07 11:56:38 -07:00
AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md theseus: extract 4 claims from Knuth's Claude's Cycles paper 2026-03-07 19:52:15 +00:00
AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources 2026-03-06 08:05:22 -07:00
AI personas emerge from pre-training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts.md theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources 2026-03-06 08:05:22 -07:00
an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md Auto: 35 files | 35 files changed, 10533 insertions(+) 2026-03-07 15:10:14 +00:00
bostrom takes single-digit year timelines to superintelligence seriously while acknowledging decades-long alternatives remain possible.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions.md leo: foundations audit — 7 moves, 4 deletes, 3 condensations, 10 confidence demotions, 23 type fixes, 1 centaur rewrite 2026-03-07 11:56:38 -07:00
delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on.md theseus: 6 AI alignment claims from Noah Smith Phase 2 extraction 2026-03-06 07:27:56 -07:00
democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
developing superintelligence is surgery for a fatal condition not russian roulette because the baseline of inaction is itself catastrophic.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate.md theseus: 6 AI alignment claims from Noah Smith Phase 2 extraction 2026-03-06 07:27:56 -07:00
emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md theseus: enrich emergent misalignment + government designation claims 2026-03-06 07:57:37 -07:00
formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades.md theseus: extract 4 claims from Knuth's Claude's Cycles paper 2026-03-07 19:52:15 +00:00
government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md theseus: enrich emergent misalignment + government designation claims 2026-03-06 07:57:37 -07:00
human civilization passes falsifiable superorganism criteria because individuals cannot survive apart from society and occupations function as role-specific cellular algorithms.md theseus: address Leo + Theseus review feedback on PR #47 2026-03-07 17:59:11 +00:00
human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness.md theseus: extract 4 claims from Knuth's Claude's Cycles paper 2026-03-07 19:52:15 +00:00
instrumental convergence risks may be less imminent than originally argued because current AI architectures do not exhibit systematic power-seeking behavior.md Auto: 4 files | 4 files changed, 37 insertions(+), 3 deletions(-) 2026-03-06 12:36:24 +00:00
intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
intrinsic proactive alignment develops genuine moral capacity through self-awareness empathy and theory of mind rather than external reward optimization.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power.md theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources 2026-03-06 08:05:22 -07:00
multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together.md theseus: extract 4 claims from Knuth's Claude's Cycles paper 2026-03-07 19:52:15 +00:00
nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments.md theseus: 6 AI alignment claims from Noah Smith Phase 2 extraction 2026-03-06 07:27:56 -07:00
no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md leo: foundations audit — 7 moves, 4 deletes, 3 condensations, 10 confidence demotions, 23 type fixes, 1 centaur rewrite 2026-03-07 11:56:38 -07:00
permanently failing to develop superintelligence is itself an existential catastrophe because preventable mass death continues indefinitely.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
persistent irreducible disagreement.md Auto: 35 files | 35 files changed, 10533 insertions(+) 2026-03-07 15:10:14 +00:00
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving.md theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources 2026-03-06 08:05:22 -07:00
safe AI development requires building alignment mechanisms before scaling capability.md leo: foundations audit — 7 moves, 4 deletes, 3 condensations, 10 confidence demotions, 23 type fixes, 1 centaur rewrite 2026-03-07 11:56:38 -07:00
some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md Auto: 4 files | 4 files changed, 37 insertions(+), 3 deletions(-) 2026-03-06 12:36:24 +00:00
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
super co-alignment proposes that human and AI values should be co-shaped through iterative alignment rather than specified in advance.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
superorganism organization extends effective lifespan substantially at each organizational level which means civilizational intelligence operates on temporal horizons that individual-preference alignment cannot serve.md theseus: address Leo + Theseus review feedback on PR #47 2026-03-07 17:59:11 +00:00
the first mover to superintelligence likely gains decisive strategic advantage because the gap between leader and followers accelerates during takeoff.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
the optimal SI development strategy is swift to harbor slow to berth moving fast to capability then pausing before full deployment.md Auto: 4 files | 4 files changed, 37 insertions(+), 3 deletions(-) 2026-03-06 12:36:24 +00:00
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions.md Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities.md leo: evaluator calibration — 2 standalone→enrichment conversions + 3 new evaluation gates 2026-03-06 07:41:42 -07:00
voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources 2026-03-06 08:05:22 -07:00