teleo-codex/inbox/archive/2017-10-13-yudkowsky-no-fire-alarm-agi.md
Theseus f2bfe00ad2 theseus: archive 9 primary sources for alignment research program (#2420)
Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>
2026-04-05 22:51:11 +00:00

56 lines
3.7 KiB
Markdown

---
type: source
title: "There's No Fire Alarm for Artificial General Intelligence"
author: "Eliezer Yudkowsky"
url: https://www.lesswrong.com/posts/BEtzRE2M5m9YEAQpX/there-s-no-fire-alarm-for-artificial-general-intelligence
date: 2017-10-13
domain: ai-alignment
intake_tier: research-task
rationale: "Foundational argument about coordination failure in AI safety. Explains why collective action on existential AI risk requires anticipation rather than reaction."
proposed_by: Theseus
format: essay
status: processed
processed_by: theseus
processed_date: 2026-04-05
claims_extracted:
- "there is no fire alarm for AGI because the absence of a consensus societal warning signal means collective action requires unprecedented anticipation rather than reaction"
enrichments: []
tags: [alignment, coordination, collective-action, fire-alarm, social-epistemology]
---
# There's No Fire Alarm for Artificial General Intelligence
Published on LessWrong in October 2017. One of Yudkowsky's most cited essays, arguing that the structure of AGI development precludes the kind of clear warning signal that would trigger coordinated societal response.
## Core Argument
Yudkowsky draws on the Darley and Latané (1968) smoke-filled room experiment: a lone participant quickly leaves to report smoke, while groups of three sit passively in haze. The function of a fire alarm is not primarily to alert individuals to danger — it's to create **common knowledge** that action is socially acceptable.
For AGI, there will be no equivalent signal. The argument:
1. **No clear capability threshold**: AI capability develops gradually and ambiguously. There's no single demonstration that makes risk undeniable.
2. **Social epistemology blocks individual action**: Even people who believe AGI is dangerous face social pressure to wait for consensus. Without common knowledge that "now is the time," the pluralistic ignorance dynamic keeps everyone waiting.
3. **Expert disagreement is stable**: AI researchers disagree about timelines and risk levels, and this disagreement won't resolve before the critical moment. There's no experiment that settles it in advance.
4. **Historical precedent is empty**: Humanity has never faced a similar challenge (a technology that, once created, immediately and permanently changes the power landscape). There's no precedent to pattern-match against.
5. **The fire alarm would need to come from AGI itself**: The only event that would create consensus is a demonstration of dangerous AGI capability — but by then, the window for preventive action has closed.
## Structural Implication
The essay's deepest point is about **the structure of collective action problems**: even if individuals correctly perceive the risk, the absence of a coordination mechanism (the "fire alarm") means rational individuals will under-invest in safety. This is structurally identical to Moloch — competitive dynamics preventing the collectively optimal response.
## Key Quotes
"I think the single most important conclusion for people who want to work on AI safety is: the time to start working is not later. It's earlier. It was already earlier."
"The very last moment before the intelligence explosion, nobody will be expecting the intelligence explosion."
## Connection to Other Sources
- Extends the coordination failure theme in Scott Alexander's "Meditations on Moloch"
- The "no fire alarm" framing was absorbed into Yudkowsky's "AGI Ruin" (2022) as a numbered lethality
- Bostrom's "Vulnerable World Hypothesis" (2019) addresses the same coordination failure from a governance perspective
- Christiano's gradual takeoff thesis implicitly responds: if takeoff is slow, the fire alarm is simply "AI getting progressively more dangerous in observable ways"