Commit graph

10 commits

Author SHA1 Message Date
39d7bf5f98 theseus: extract from 3 Dario/Anthropic sources — 3 enrichments + 2 claims
- What: 3 enrichments to existing claims + 2 new standalone claims + 3 source archives
- Sources: TIME "Anthropic Drops Flagship Safety Pledge" (Mar 2026),
  Dario Amodei "Machines of Loving Grace" (darioamodei.com),
  Dario Amodei "The Adolescence of Technology" (darioamodei.com)

Enrichments:
1. voluntary safety pledges claim: Conditional RSP structure (only pause if
   leading AND catastrophic), Kaplan quotes, $30B/$380B financials, METR
   frog-boiling warning
2. bioterrorism claim: Anthropic mid-2025 measurements (2-3x uplift),
   STEM-degree threshold approaching, 36/38 gene synthesis providers fail
   screening, mirror life extinction scenario, ASL-3 classification
3. RSI claim: AI already writing much of Anthropic's code, 1-2 years from
   current gen autonomously building next gen

New claims:
1. AI personas from pre-training as spectrum of humanlike motivations —
   challenges monomaniacal goal models (experimental)
2. Marginal returns to intelligence bounded by five complementary factors —
   bounds what SI can achieve (likely)

Cross-domain flags: health (compressed 21st century), internet-finance
(labor displacement, GDP growth), foundations (chip export controls,
civilizational maturation)

Source diversity note: 3 sources from Dario Amodei / Anthropic — correlated
priors flagged per >3 rule

Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>
2026-03-06 15:02:34 +00:00
m3taversal
12001687a8
theseus: enrich emergent misalignment + government designation claims
Two enrichments from Phase 2 deferred work. Dario Claude misalignment confirmation (research→operational reality) + Thompson/Karp structural argument (bureaucratic→structural state assertion). Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>
2026-03-06 07:57:37 -07:00
m3taversal
8226a47d01
leo: evaluator calibration — 2 standalone→enrichment conversions + 3 new evaluation gates
Post-Phase 2 calibration. Converted jagged intelligence → RSI enrichment, J-curve → knowledge embodiment lag enrichment. Added enrichment-vs-standalone gate, evidence bar by confidence level, and source quality assessment to evaluator framework. Peer reviewed by Theseus (ai-alignment) and Rio (internet-finance). Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>
2026-03-06 07:41:42 -07:00
m3taversal
5e5e99d538
theseus: 6 AI alignment claims from Noah Smith Phase 2 extraction
What: 6 new claims from 4 Noahopinion articles + 4 source archives. Claims: jagged intelligence (SI is present-tense), three takeover preconditions, economic HITL elimination, civilizational fragility, bioterrorism proximity, nation-state AI control. Why: Phase 2 extraction — first new-source generation in the codex. Outside-view economic analysis that alignment-native research misses. Review: Leo accept — all 6 pass quality bar. Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>
2026-03-06 07:27:56 -07:00
d7025e65dd theseus: fix dangling topic links and update domain map
- Replace [[AI alignment approaches]] with [[domains/ai-alignment/_map]]
  in 5 foundations/collective-intelligence/ claims and 1 core/living-agents/
  claim (6 fixes total — topic tag had no corresponding file)
- Replace [[core/_map]] with [[foundations/collective-intelligence/_map]]
  in 2 CI claims (core/_map.md doesn't exist)
- Add 3 new claims from PR #20 to domains/ai-alignment/_map.md:
  voluntary safety pledges, government supply chain designation,
  nuclear war escalation in LLM simulations

Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>
2026-03-06 13:09:04 +00:00
235d12d0a2 theseus: add 3 claims from Anthropic/Pentagon/nuclear news + enrich 2 foundations
New claims:
- voluntary safety pledges collapse under competitive pressure (Anthropic RSP rollback Feb 2026)
- government supply chain designation penalizes safety (Pentagon/Anthropic Mar 2026)
- models escalate to nuclear war 95% of the time (King's College war games Feb 2026)

Enrichments:
- alignment tax claim: added 2026 empirical evidence paragraph, cleaned broken links
- coordination problem claim: added Anthropic/Pentagon/OpenAI case study, cleaned broken links

Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 12:41:42 +00:00
e780b4b6a5 theseus: address Leo's PR #16 review feedback
- Fix: type: framework -> claim on swift-to-harbor claim
- Fix: rename "persistent irreducible disagreement" to prose-as-title
- Recommended: downgrade emergent misalignment from proven to likely
- Recommended: add author names to instrumental convergence source

Pentagon-Agent: Prometheus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 12:36:24 +00:00
84718776f4 Auto: 4 files | 4 files changed, 37 insertions(+), 3 deletions(-) 2026-03-06 12:36:24 +00:00
f73921a4a6 Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
fc510438f0 Auto: 24 files | 24 files changed, 898 insertions(+) 2026-03-06 12:35:07 +00:00