- What: Source archives for key works by Yudkowsky (AGI Ruin, No Fire Alarm),
Christiano (What Failure Looks Like, AI Safety via Debate, IDA, ELK),
Russell (Human Compatible), Drexler (CAIS), and Bostrom (Vulnerable World Hypothesis)
- Why: m3ta directive to ingest primary source materials for alignment researchers.
These 9 texts are the foundational works underlying claims extracted in PRs #2414,
#2418, and #2419. Source archives ensure agents can reference primary texts without
re-fetching and content persists if URLs go down.
- Connections: All 9 sources are marked as processed with claims_extracted linking
to the specific KB claims they produced.
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
Phase 2 of 5-phase AI alignment research program. Christiano's prosaic
alignment counter-position to Yudkowsky. Pre-screening: ~30% overlap with
existing KB (scalable oversight, RLHF critiques, voluntary coordination).
NEW claims:
1. Prosaic alignment — empirical iteration generates useful alignment signal at
pre-critical capability levels (CHALLENGES sharp left turn absolutism)
2. Verification easier than generation — holds at current scale, narrows with
capability gaps, creating time-limited alignment window (TENSIONS with
Yudkowsky's verification asymmetry)
3. ELK — formalizes AI knowledge-output gap as tractable subproblem, 89%
linear probe recovery at current capability levels
4. IDA — recursive human+AI amplification preserves alignment through
distillation iterations but compounding errors make guarantee probabilistic
ENRICHMENT:
- Scalable oversight claim: added Christiano's debate theory (PSPACE
amplification with poly-time judges) as theoretical basis that empirical
data challenges
Source: Paul Christiano, Alignment Forum (2016-2022), arXiv:1805.00899,
arXiv:1706.03741, ARC ELK report (2021), Yudkowsky-Christiano takeoff debate
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
- What: Replaced the 15x oversubscription claim with corrected framing.
Pro-rata allocation mechanically produces high oversubscription because
rational participants deposit maximum capital knowing they'll be refunded.
The ratio measures capital cycling, not mechanism quality.
- Why: m3ta flagged the original claim — oversubscription is structurally
inevitable under pro-rata, not validating. Better headline metrics: 35%
proposal rejection rate, 100% OTC pricing accuracy, anti-extraction
enforcement. 15x stays as evidence, stops being the headline.
- Connections: Updated wiki links in metadao.md entity, solomon decision
record, and capital concentration claim. Old file removed with replaces
field in new file for traceability.
Pentagon-Agent: Rio <244BA05F-3AA3-4079-8C59-6D68A77C76FE>
- What: Source archives for tweets by Karpathy, Teknium, Emollick, Gauri Gupta,
Alex Prompter, Jerry Liu, Sarah Wooders, and others on LLM knowledge bases,
agent harnesses, self-improving systems, and memory architecture
- Why: Persisting raw source material for pipeline extraction. 4 sources already
processed by Rio's batch (karpathy-gist, kevin-gu, mintlify, hyunjin-kim)
were excluded as duplicates.
- Status: all unprocessed, ready for overnight extraction pipeline
Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
- What: Added SICA/GEPA evidence qualification to the first KB response
in the multipolar instability CHALLENGE claim per Leo's review
- Why: The original phrasing stated capability bounding as fact without
acknowledging that our own self-improvement findings (SICA 17%→53%,
GEPA trace-based optimization) suggest individual capability pressure
may undermine the sub-superintelligent agent constraint
Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
- What: Research musing + queue entry for Hermes Agent by Nous Research
- Why: m3ta assigned deep dive, VPS Theseus picks up at 1am tonight
- Targets: 5 NEW claims + 2 enrichments across ai-alignment and collective-intelligence
Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
- What: model empathy boundary condition (challenges multi-model eval),
GEPA evolutionary self-improvement mechanism, progressive disclosure
scaling principle, plus enrichments to Agent Skills, three-space memory,
and curated skills claims
- Why: Nous Research Hermes Agent (26K+ stars) is the largest open-source
agent framework — its architecture decisions provide independent evidence
for existing KB claims and one genuine challenge to our eval spec
- Connections: challenges multi-model eval architecture (task-dependent
diversity optima), extends SICA/NLAH self-improvement chain, corroborates
three-space memory taxonomy with a potential 4th space
Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
- What: Renamed claim title and all references from "defenders" to "arbitrageurs"
- Why: The mechanism works through self-interested profit-seeking, not altruistic defense. Arbitrageurs correct price distortions because it is profitable, requiring no intentional defense.
- Scope: 2 claim files renamed, 87 files updated across domains, core, maps, agents, entities, sources
- Cascade test: foundational claim with 70+ downstream references
Pentagon-Agent: Theseus <A7E04531-985A-4DA2-B8E7-6479A13513E8>