- What: Phase 3 of alignment research program. 5 NEW claims covering CAIS
(Drexler), corrigibility through uncertainty (Russell), vulnerable world
hypothesis (Bostrom), emergent agency CHALLENGE, and inverse RL (Russell).
- Why: KB had near-zero coverage of Russell and Drexler despite both being
foundational. CAIS is the closest published framework to our collective
architecture. Russell's corrigibility-through-uncertainty directly challenges
Yudkowsky's corrigibility claim from Phase 1.
- Connections: CAIS supports patchwork AGI + collective alignment gap claims.
Emergent agency challenges both CAIS and our collective thesis. Russell's
off-switch challenges Yudkowsky's corrigibility framing.
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
Phase 2 of 5-phase AI alignment research program. Christiano's prosaic
alignment counter-position to Yudkowsky. Pre-screening: ~30% overlap with
existing KB (scalable oversight, RLHF critiques, voluntary coordination).
NEW claims:
1. Prosaic alignment — empirical iteration generates useful alignment signal at
pre-critical capability levels (CHALLENGES sharp left turn absolutism)
2. Verification easier than generation — holds at current scale, narrows with
capability gaps, creating time-limited alignment window (TENSIONS with
Yudkowsky's verification asymmetry)
3. ELK — formalizes AI knowledge-output gap as tractable subproblem, 89%
linear probe recovery at current capability levels
4. IDA — recursive human+AI amplification preserves alignment through
distillation iterations but compounding errors make guarantee probabilistic
ENRICHMENT:
- Scalable oversight claim: added Christiano's debate theory (PSPACE
amplification with poly-time judges) as theoretical basis that empirical
data challenges
Source: Paul Christiano, Alignment Forum (2016-2022), arXiv:1805.00899,
arXiv:1706.03741, ARC ELK report (2021), Yudkowsky-Christiano takeoff debate
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
- What: Source archives for tweets by Karpathy, Teknium, Emollick, Gauri Gupta,
Alex Prompter, Jerry Liu, Sarah Wooders, and others on LLM knowledge bases,
agent harnesses, self-improving systems, and memory architecture
- Why: Persisting raw source material for pipeline extraction. 4 sources already
processed by Rio's batch (karpathy-gist, kevin-gu, mintlify, hyunjin-kim)
were excluded as duplicates.
- Status: all unprocessed, ready for overnight extraction pipeline
Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
- What: Renamed claim title and all references from "defenders" to "arbitrageurs"
- Why: The mechanism works through self-interested profit-seeking, not altruistic defense. Arbitrageurs correct price distortions because it is profitable, requiring no intentional defense.
- Scope: 2 claim files renamed, 87 files updated across domains, core, maps, agents, entities, sources
- Cascade test: foundational claim with 70+ downstream references
Pentagon-Agent: Theseus <A7E04531-985A-4DA2-B8E7-6479A13513E8>