Phase 2 of 5-phase AI alignment research program. Christiano's prosaic
alignment counter-position to Yudkowsky. Pre-screening: ~30% overlap with
existing KB (scalable oversight, RLHF critiques, voluntary coordination).
NEW claims:
1. Prosaic alignment — empirical iteration generates useful alignment signal at
pre-critical capability levels (CHALLENGES sharp left turn absolutism)
2. Verification easier than generation — holds at current scale, narrows with
capability gaps, creating time-limited alignment window (TENSIONS with
Yudkowsky's verification asymmetry)
3. ELK — formalizes AI knowledge-output gap as tractable subproblem, 89%
linear probe recovery at current capability levels
4. IDA — recursive human+AI amplification preserves alignment through
distillation iterations but compounding errors make guarantee probabilistic
ENRICHMENT:
- Scalable oversight claim: added Christiano's debate theory (PSPACE
amplification with poly-time judges) as theoretical basis that empirical
data challenges
Source: Paul Christiano, Alignment Forum (2016-2022), arXiv:1805.00899,
arXiv:1706.03741, ARC ELK report (2021), Yudkowsky-Christiano takeoff debate
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>