37 lines
3.1 KiB
Markdown
37 lines
3.1 KiB
Markdown
---
|
|
source: collected
|
|
author: "Eliezer Yudkowsky"
|
|
title: "Yudkowsky Core Arguments — Collected Works"
|
|
date: 2025-09-26
|
|
url: null
|
|
status: processing
|
|
domain: ai-alignment
|
|
format: collected
|
|
tags: [alignment, existential-risk, intelligence-explosion, corrigibility, takeoff]
|
|
notes: "Compound source covering Yudkowsky's core body of work: 'AGI Ruin: A List of Lethalities' (2022), 'Intelligence Explosion Microeconomics' (2013), 'There's No Fire Alarm for AGI' (2017), Sequences/Rationality: A-Z (2006-2009), TIME op-ed 'Shut It Down' (2023), 'If Anyone Builds It, Everyone Dies' with Nate Soares (2025), various LessWrong posts on corrigibility and mesa-optimization. Yudkowsky is the foundational figure in AI alignment — co-founder of MIRI, originator of instrumental convergence, orthogonality thesis, and the intelligence explosion framework. Most alignment discourse either builds on or reacts against his arguments."
|
|
---
|
|
|
|
# Yudkowsky Core Arguments — Collected Works
|
|
|
|
Eliezer Yudkowsky's foundational contributions to AI alignment, synthesized across his major works from 2006-2025. This is a compound source because his arguments form a coherent system — individual papers express facets of a unified worldview rather than standalone claims.
|
|
|
|
## Key Works
|
|
|
|
1. **Sequences / Rationality: A-Z (2006-2009)** — Epistemic foundations. Beliefs must "pay rent" in predictions. Bayesian epistemology as substrate. Map-territory distinction.
|
|
|
|
2. **"Intelligence Explosion Microeconomics" (2013)** — Formalizes returns on cognitive reinvestment. If output-to-capability investment yields constant or increasing returns, recursive self-improvement produces discontinuous capability gain.
|
|
|
|
3. **"There's No Fire Alarm for AGI" (2017)** — Structural absence of warning signal. Capability scaling is gradual and ambiguous. Collective action requires anticipation, not reaction.
|
|
|
|
4. **"AGI Ruin: A List of Lethalities" (2022)** — Concentrated doom argument. Alignment techniques that work at low capability catastrophically fail at superintelligence. No iteration on the critical try. ~2 year proliferation window.
|
|
|
|
5. **TIME Op-Ed: "Shut It Down" (2023)** — Indefinite worldwide moratorium, decreasing compute caps, GPU tracking, military enforcement. Most aggressive mainstream policy position.
|
|
|
|
6. **"If Anyone Builds It, Everyone Dies" with Nate Soares (2025)** — Book-length treatment. Fast takeoff → near-certain extinction. Training reward-desire link is chaotic. Multipolar AI outcomes unstable. International treaty enforcement needed.
|
|
|
|
## Cross-Referencing Debates
|
|
|
|
- **vs. Robin Hanson** (AI-Foom Debate, 2008-2013): Takeoff speed. Yudkowsky: recursive self-improvement → hard takeoff. Hanson: gradual, economy-driven.
|
|
- **vs. Paul Christiano** (ongoing): Prosaic alignment sufficient? Christiano: yes, empirical iteration works. Yudkowsky: no, sharp left turn makes it fundamentally inadequate.
|
|
- **vs. Richard Ngo**: Can we build intelligent but less agentic AI? Ngo: yes. Yudkowsky: agency is instrumentally convergent.
|
|
- **vs. Shard Theory (Shah et al.)**: Value formation complexity. Shah: gradient descent isn't as analogous to evolution as Yudkowsky claims. ~5% vs much higher doom estimates.
|