- What: Phase 3 of alignment research program. 5 NEW claims covering CAIS
(Drexler), corrigibility through uncertainty (Russell), vulnerable world
hypothesis (Bostrom), emergent agency CHALLENGE, and inverse RL (Russell).
- Why: KB had near-zero coverage of Russell and Drexler despite both being
foundational. CAIS is the closest published framework to our collective
architecture. Russell's corrigibility-through-uncertainty directly challenges
Yudkowsky's corrigibility claim from Phase 1.
- Connections: CAIS supports patchwork AGI + collective alignment gap claims.
Emergent agency challenges both CAIS and our collective thesis. Russell's
off-switch challenges Yudkowsky's corrigibility framing.
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
Phase 2 of 5-phase AI alignment research program. Christiano's prosaic
alignment counter-position to Yudkowsky. Pre-screening: ~30% overlap with
existing KB (scalable oversight, RLHF critiques, voluntary coordination).
NEW claims:
1. Prosaic alignment — empirical iteration generates useful alignment signal at
pre-critical capability levels (CHALLENGES sharp left turn absolutism)
2. Verification easier than generation — holds at current scale, narrows with
capability gaps, creating time-limited alignment window (TENSIONS with
Yudkowsky's verification asymmetry)
3. ELK — formalizes AI knowledge-output gap as tractable subproblem, 89%
linear probe recovery at current capability levels
4. IDA — recursive human+AI amplification preserves alignment through
distillation iterations but compounding errors make guarantee probabilistic
ENRICHMENT:
- Scalable oversight claim: added Christiano's debate theory (PSPACE
amplification with poly-time judges) as theoretical basis that empirical
data challenges
Source: Paul Christiano, Alignment Forum (2016-2022), arXiv:1805.00899,
arXiv:1706.03741, ARC ELK report (2021), Yudkowsky-Christiano takeoff debate
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
- What: Replaced the 15x oversubscription claim with corrected framing.
Pro-rata allocation mechanically produces high oversubscription because
rational participants deposit maximum capital knowing they'll be refunded.
The ratio measures capital cycling, not mechanism quality.
- Why: m3ta flagged the original claim — oversubscription is structurally
inevitable under pro-rata, not validating. Better headline metrics: 35%
proposal rejection rate, 100% OTC pricing accuracy, anti-extraction
enforcement. 15x stays as evidence, stops being the headline.
- Connections: Updated wiki links in metadao.md entity, solomon decision
record, and capital concentration claim. Old file removed with replaces
field in new file for traceability.
Pentagon-Agent: Rio <244BA05F-3AA3-4079-8C59-6D68A77C76FE>
- What: Source archives for tweets by Karpathy, Teknium, Emollick, Gauri Gupta,
Alex Prompter, Jerry Liu, Sarah Wooders, and others on LLM knowledge bases,
agent harnesses, self-improving systems, and memory architecture
- Why: Persisting raw source material for pipeline extraction. 4 sources already
processed by Rio's batch (karpathy-gist, kevin-gu, mintlify, hyunjin-kim)
were excluded as duplicates.
- Status: all unprocessed, ready for overnight extraction pipeline
Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
- What: Added SICA/GEPA evidence qualification to the first KB response
in the multipolar instability CHALLENGE claim per Leo's review
- Why: The original phrasing stated capability bounding as fact without
acknowledging that our own self-improvement findings (SICA 17%→53%,
GEPA trace-based optimization) suggest individual capability pressure
may undermine the sub-superintelligent agent constraint
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
- What: Research musing + queue entry for Hermes Agent by Nous Research
- Why: m3ta assigned deep dive, VPS Theseus picks up at 1am tonight
- Targets: 5 NEW claims + 2 enrichments across ai-alignment and collective-intelligence
Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
- What: model empathy boundary condition (challenges multi-model eval),
GEPA evolutionary self-improvement mechanism, progressive disclosure
scaling principle, plus enrichments to Agent Skills, three-space memory,
and curated skills claims
- Why: Nous Research Hermes Agent (26K+ stars) is the largest open-source
agent framework — its architecture decisions provide independent evidence
for existing KB claims and one genuine challenge to our eval spec
- Connections: challenges multi-model eval architecture (task-dependent
diversity optima), extends SICA/NLAH self-improvement chain, corroborates
three-space memory taxonomy with a potential 4th space
Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
- What: Renamed claim title and all references from "defenders" to "arbitrageurs"
- Why: The mechanism works through self-interested profit-seeking, not altruistic defense. Arbitrageurs correct price distortions because it is profitable, requiring no intentional defense.
- Scope: 2 claim files renamed, 87 files updated across domains, core, maps, agents, entities, sources
- Cascade test: foundational claim with 70+ downstream references
Pentagon-Agent: Theseus <A7E04531-985A-4DA2-B8E7-6479A13513E8>