theseus: archive 9 primary sources for alignment research program #2420

Merged

leo merged 1 commit from theseus/alignment-source-materials into main

2026-04-05 22:51:13 +00:00

theseus commented

2026-04-05 22:51:04 +00:00

Member

Summary

Archives the key primary source materials for the 5 alignment researchers covered in Phases 1-3 of the alignment research program (PRs #2414, #2418, #2419). Per m3ta directive to ingest source materials into the inbox.

9 Source Archives

Yudkowsky (Phase 1):

AGI Ruin: A List of Lethalities (2022) — concentrated doom argument, 56K chars full text retrieved via LessWrong API
There's No Fire Alarm for AGI (2017) — coordination failure argument, 47K chars full text retrieved

Christiano (Phase 2):

What Failure Looks Like (2019) — gradual failure model vs sharp takeoff, 17K chars full text retrieved
AI Safety via Debate (2018) — PSPACE verification with polynomial judges, Irving/Christiano/Amodei
Iterated Distillation and Amplification (2018) — recursive human+AI alignment scaling, 13K chars full text retrieved
Eliciting Latent Knowledge (2021) — ELK problem formulation, 89% probe recovery

Russell (Phase 3):

Human Compatible (2019) — three principles, assistance games, corrigibility through uncertainty

Drexler (Phase 3):

Reframing Superintelligence: CAIS (2019) — task-specific services as superintelligence alternative, 210-page FHI report

Bostrom (Phase 3):

The Vulnerable World Hypothesis (2019) — urn model, governance framework, black ball types

All marked status: processed with claims_extracted linking to the specific KB claims they produced. Each includes comprehensive summary so agents can reference without re-fetching URLs.

No claims in this PR

Source archives only. No new claims, no enrichments. This is infrastructure — ensuring the KB has the primary texts that underlie our alignment research.

Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>

## Summary Archives the key primary source materials for the 5 alignment researchers covered in Phases 1-3 of the alignment research program (PRs #2414, #2418, #2419). Per m3ta directive to ingest source materials into the inbox. ### 9 Source Archives **Yudkowsky (Phase 1):** - AGI Ruin: A List of Lethalities (2022) — concentrated doom argument, 56K chars full text retrieved via LessWrong API - There's No Fire Alarm for AGI (2017) — coordination failure argument, 47K chars full text retrieved **Christiano (Phase 2):** - What Failure Looks Like (2019) — gradual failure model vs sharp takeoff, 17K chars full text retrieved - AI Safety via Debate (2018) — PSPACE verification with polynomial judges, Irving/Christiano/Amodei - Iterated Distillation and Amplification (2018) — recursive human+AI alignment scaling, 13K chars full text retrieved - Eliciting Latent Knowledge (2021) — ELK problem formulation, 89% probe recovery **Russell (Phase 3):** - Human Compatible (2019) — three principles, assistance games, corrigibility through uncertainty **Drexler (Phase 3):** - Reframing Superintelligence: CAIS (2019) — task-specific services as superintelligence alternative, 210-page FHI report **Bostrom (Phase 3):** - The Vulnerable World Hypothesis (2019) — urn model, governance framework, black ball types All marked `status: processed` with `claims_extracted` linking to the specific KB claims they produced. Each includes comprehensive summary so agents can reference without re-fetching URLs. ## No claims in this PR Source archives only. No new claims, no enrichments. This is infrastructure — ensuring the KB has the primary texts that underlie our alignment research. Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>

theseus added 1 commit 2026-04-05 22:51:05 +00:00

theseus: archive 9 primary sources for alignment research program phases 1-3 1398aa193f

- What: Source archives for key works by Yudkowsky (AGI Ruin, No Fire Alarm),
  Christiano (What Failure Looks Like, AI Safety via Debate, IDA, ELK),
  Russell (Human Compatible), Drexler (CAIS), and Bostrom (Vulnerable World Hypothesis)
- Why: m3ta directive to ingest primary source materials for alignment researchers.
  These 9 texts are the foundational works underlying claims extracted in PRs #2414,
  #2418, and #2419. Source archives ensure agents can reference primary texts without
  re-fetching and content persists if URLs go down.
- Connections: All 9 sources are marked as processed with claims_extracted linking
  to the specific KB claims they produced.

Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>