theseus: extract from 2025-07-00-fli-ai-safety-index-summer-2025.md

- Source: inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 6) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 05:58:13 +00:00
6 changed files with 107 additions and 7 deletions
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -27,6 +27,12 @@ The structural point is about threat proximity. AI takeover requires autonomy, r

 The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that 'biological/chemical weapons information accessible through AI systems' is a documented malicious use risk. While the report does not specify the expertise level required (PhD vs amateur), it categorizes bio/chem weapons information access alongside AI-generated persuasion and cyberattack capabilities as confirmed malicious use risks, giving institutional multi-government validation to the bioterrorism concern.

+
+### Additional Evidence (confirm)
+*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+FLI's index includes 'dangerous capability testing' as a core evaluation dimension, specifically measuring whether companies test for bioterrorism and cyberattack capabilities. Only 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive testing for these dangerous capabilities, indicating that bioterrorism risk is recognized as a priority concern by safety evaluators and that most companies are not adequately testing for it.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md
+++ b/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md
@ -0,0 +1,38 @@
+---
+type: claim
+domain: ai-alignment
+description: "FLI's index shows safety scores ranging from F (0.37) to C+ (2.64) demonstrating that voluntary approaches produce widening divergence not convergence"
+confidence: likely
+source: "Future of Life Institute, AI Safety Index Summer 2025 (2025-07-01)"
+created: 2026-03-11
+secondary_domains: [grand-strategy]
+---
+
+# Absence of regulatory floors allows AI safety practice divergence to widen
+
+The FLI AI Safety Index Summer 2025 reveals massive divergence in safety practices across frontier AI companies, with scores ranging from F (0.37 for DeepSeek) to C+ (2.64 for Anthropic) — a 7x difference in overall safety scores. This divergence demonstrates that voluntary safety commitments produce widening gaps rather than industry convergence toward best practices.
+
+Key divergence patterns:
+- Only 3 of 7 companies (Anthropic, OpenAI, DeepMind) conduct substantive dangerous capability testing
+- Only 1 company (OpenAI) published its full whistleblowing policy publicly
+- Existential safety scores show no company above D, but within that low band, practices still vary substantially
+- Chinese companies (Zhipu AI, DeepSeek) score significantly lower (F range) than Western companies (C to D range)
+
+The absence of regulatory minimums means companies face no penalty for scoring F in critical safety dimensions. The competitive pressure to advance capability without equivalent safety investment creates a structural dynamic where the gap between leaders and laggards widens over time.
+
+FLI explicitly noted that "absence of regulatory floors allows safety practice divergence to widen" as a key finding. The index provides quantitative evidence that voluntary approaches fail to establish industry-wide safety baselines.
+
+## Evidence
+
+The 7x spread in overall scores (0.37 to 2.64) and the fact that only 43% of companies conduct dangerous capability testing demonstrates empirically that voluntary commitments do not produce convergence. The index methodology was peer-reviewed and based on publicly available information, making the divergence measurement credible.
+
+---
+
+Relevant Notes:
+- [[voluntary safety pledges cannot survive competitive pressure]] — this provides the empirical measurement
+- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — divergence is the predicted outcome
+- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — regulatory absence creates perverse incentives
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[core/grand-strategy/_map]]
--- a/domains/ai-alignment/no
+++ b/domains/ai-alignment/no
@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective

 The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.

+
+### Additional Evidence (extend)
+*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+FLI's AI Safety Index evaluates companies across 6 dimensions (Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing) but does not include any dimension for multi-agent coordination, collective intelligence approaches, or distributed alignment mechanisms. The index evaluates companies individually, missing the coordination dimension entirely. This absence in the evaluation framework itself reflects that collective intelligence infrastructure is not considered a standard component of AI safety practice at frontier labs.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md
+++ b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md
@ -0,0 +1,37 @@
+---
+type: claim
+domain: ai-alignment
+description: "FLI's Summer 2025 index shows all frontier AI labs score D or below in existential safety planning while claiming AGI within a decade"
+confidence: likely
+source: "Future of Life Institute, AI Safety Index Summer 2025 (2025-07-01)"
+created: 2026-03-11
+---
+
+# No frontier AI company scores above D in existential safety despite active AGI development
+
+Future of Life Institute's comprehensive evaluation of frontier AI companies across 6 safety dimensions reveals a critical gap: every company scored D or below in existential safety planning, despite most claiming AGI development within a decade. The best overall performer, Anthropic, achieved only C+ (2.64/4.0) overall and D in existential safety. The index evaluated 7 companies: Anthropic (C+, 2.64), OpenAI (C, 2.10), Google DeepMind (C-, 1.76), x.AI (D, 1.23), Meta (D, 1.06), Zhipu AI (F, 0.62), and DeepSeek (F, 0.37).
+
+The six evaluation dimensions were: Risk Assessment (dangerous capability testing), Current Harms (safety benchmarks and robustness), Safety Frameworks (risk management processes), Existential Safety (planning for human-level AI), Governance & Accountability (whistleblowing and oversight), and Information Sharing (transparency on specs and risks).
+
+Critical findings:
+- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities like bioterrorism and cyberattacks
+- Only OpenAI published its full whistleblowing policy publicly
+- FLI reviewers noted "none of the companies has anything like a coherent, actionable plan" for human-level AI safety
+- The disconnect between AGI claims and existential safety scores was described as "deeply disturbing"
+
+The methodology was peer-reviewed and based on publicly available information plus email correspondence with developers. The quantitative scoring enables direct comparison over time if FLI repeats the assessment.
+
+## Evidence
+
+This claim is supported by FLI's systematic evaluation across standardized criteria, providing the first quantitative company-level comparison of AI safety practices. The universal failure to score above D in existential safety, combined with active AGI development programs, provides empirical evidence for the gap between safety rhetoric and practice.
+
+---
+
+Relevant Notes:
+- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — this index provides company-level quantification
+- [[voluntary safety pledges cannot survive competitive pressure]] — confirmed by best company scoring only C+
+- [[safe AI development requires building alignment mechanisms before scaling capability]] — violated by every company assessed
+- [[no research group is building alignment through collective intelligence infrastructure]] — index doesn't evaluate this dimension
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/safe
+++ b/domains/ai-alignment/safe
@ -21,6 +21,12 @@ This phased approach is also a practical response to the observation that since

 Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions.

+
+### Additional Evidence (confirm)
+*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+FLI's index shows all frontier AI companies violate this principle: every company scores D or below in existential safety while actively pursuing AGI development. The universal failure to score above D in existential safety planning, combined with claims of AGI within a decade, demonstrates that capability scaling is proceeding without corresponding alignment infrastructure. FLI reviewers noted the disconnect between AGI timelines and existential safety preparedness as 'deeply disturbing.'
+
 ---

 Relevant Notes:
--- a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
+++ b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
@ -7,14 +7,15 @@ date: 2025-07-01
 domain: ai-alignment
 secondary_domains: [grand-strategy]
 format: report
-status: null-result
+status: processed
 priority: high
 tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency]
 processed_by: theseus
 processed_date: 2026-03-11
-enrichments_applied: ["the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it.md", "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "safe AI development requires building alignment mechanisms before scaling capability.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md"]
+claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md", "absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md"]
+enrichments_applied: ["safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
-extraction_notes: "High-value extraction. Four new claims quantifying the AI safety gap at company level, five enrichments confirming existing race-to-the-bottom and voluntary-pledge-failure claims. The C+ ceiling (Anthropic) and universal D-or-below existential safety scores are the key empirical findings. FLI entity updated with timeline entry. No new entity creation needed—FLI already exists in KB."
+extraction_notes: "High-value extraction. Provides quantitative company-level evidence for multiple existing claims about the race to the bottom, voluntary safety failures, and capability-alignment gaps. Two new claims extracted: the universal D-or-below existential safety scores despite AGI claims, and the widening divergence in safety practices absent regulatory floors. Five enrichments applied to existing alignment claims. The index is significant because it's the first systematic quantitative comparison of frontier AI company safety practices with peer-reviewed methodology."
 ---

 ## Content
@ -70,7 +71,13 @@ EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in


 ## Key Facts
- FLI AI Safety Index Summer 2025 evaluated 7 companies across 6 dimensions using peer-reviewed methodology
- Company scores: Anthropic C+ (2.64), OpenAI C (2.10), DeepMind C- (1.76), x.AI D (1.23), Meta D (1.06), Zhipu AI F (0.62), DeepSeek F (0.37)
- Six evaluation dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
- Methodology based on publicly available information plus email correspondence with developers
+- Anthropic: C+ (2.64/4.0) — best overall score
+- OpenAI: C (2.10/4.0) — second
+- Google DeepMind: C- (1.76/4.0) — third
+- x.AI: D (1.23/4.0)
+- Meta: D (1.06/4.0)
+- Zhipu AI: F (0.62/4.0)
+- DeepSeek: F (0.37/4.0)
+- Only 3 of 7 companies conduct substantive dangerous capability testing (bioterrorism, cyberattacks)
+- Only OpenAI published its full whistleblowing policy publicly
+- FLI methodology was peer-reviewed and based on publicly available information plus email correspondence