theseus: extract from 2025-07-00-fli-ai-safety-index-summer-2025.md

- Source: inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 6) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 05:58:13 +00:00
9 changed files with 73 additions and 120 deletions
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -27,6 +27,12 @@ The structural point is about threat proximity. AI takeover requires autonomy, r

 The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that 'biological/chemical weapons information accessible through AI systems' is a documented malicious use risk. While the report does not specify the expertise level required (PhD vs amateur), it categorizes bio/chem weapons information access alongside AI-generated persuasion and cyberattack capabilities as confirmed malicious use risks, giving institutional multi-government validation to the bioterrorism concern.

+
+### Additional Evidence (confirm)
+*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+FLI's index includes 'dangerous capability testing' as a core evaluation dimension, specifically measuring whether companies test for bioterrorism and cyberattack capabilities. Only 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive testing for these dangerous capabilities, indicating that bioterrorism risk is recognized as a priority concern by safety evaluators and that most companies are not adequately testing for it.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md
+++ b/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md
@ -0,0 +1,38 @@
+---
+type: claim
+domain: ai-alignment
+description: "FLI's index shows safety scores ranging from F (0.37) to C+ (2.64) demonstrating that voluntary approaches produce widening divergence not convergence"
+confidence: likely
+source: "Future of Life Institute, AI Safety Index Summer 2025 (2025-07-01)"
+created: 2026-03-11
+secondary_domains: [grand-strategy]
+---
+
+# Absence of regulatory floors allows AI safety practice divergence to widen
+
+The FLI AI Safety Index Summer 2025 reveals massive divergence in safety practices across frontier AI companies, with scores ranging from F (0.37 for DeepSeek) to C+ (2.64 for Anthropic) — a 7x difference in overall safety scores. This divergence demonstrates that voluntary safety commitments produce widening gaps rather than industry convergence toward best practices.
+
+Key divergence patterns:
+- Only 3 of 7 companies (Anthropic, OpenAI, DeepMind) conduct substantive dangerous capability testing
+- Only 1 company (OpenAI) published its full whistleblowing policy publicly
+- Existential safety scores show no company above D, but within that low band, practices still vary substantially
+- Chinese companies (Zhipu AI, DeepSeek) score significantly lower (F range) than Western companies (C to D range)
+
+The absence of regulatory minimums means companies face no penalty for scoring F in critical safety dimensions. The competitive pressure to advance capability without equivalent safety investment creates a structural dynamic where the gap between leaders and laggards widens over time.
+
+FLI explicitly noted that "absence of regulatory floors allows safety practice divergence to widen" as a key finding. The index provides quantitative evidence that voluntary approaches fail to establish industry-wide safety baselines.
+
+## Evidence
+
+The 7x spread in overall scores (0.37 to 2.64) and the fact that only 43% of companies conduct dangerous capability testing demonstrates empirically that voluntary commitments do not produce convergence. The index methodology was peer-reviewed and based on publicly available information, making the divergence measurement credible.
+
+---
+
+Relevant Notes:
+- [[voluntary safety pledges cannot survive competitive pressure]] — this provides the empirical measurement
+- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — divergence is the predicted outcome
+- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — regulatory absence creates perverse incentives
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[core/grand-strategy/_map]]
--- a/domains/ai-alignment/ai-whistleblowing-policy-transparency-limited-to-single-company.md
+++ b/domains/ai-alignment/ai-whistleblowing-policy-transparency-limited-to-single-company.md
@ -1,40 +0,0 @@
---
-type: claim
-domain: ai-alignment
-description: "Only OpenAI has published its full whistleblowing policy publicly among frontier AI companies, per FLI Summer 2025 index"
-confidence: likely
-source: "Future of Life Institute, AI Safety Index Summer 2025, July 2025"
-created: 2026-03-11
-secondary_domains: ["grand-strategy"]
---
-
-# AI whistleblowing policy transparency is limited to a single company
-
-Future of Life Institute's Summer 2025 evaluation found that only OpenAI has published its complete whistleblowing policy publicly among the seven frontier AI companies assessed. This creates a significant accountability infrastructure gap, as whistleblowing mechanisms are critical for surfacing safety concerns that internal processes fail to address.
-
-The absence of public whistleblowing policies at Anthropic, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek means that:
-1. Employees and contractors cannot evaluate protections before raising concerns
-2. External observers cannot assess whether adequate channels exist
-3. There is no public commitment that can be held accountable
-
-This finding is particularly striking given that Anthropic positions itself as a safety-focused lab yet does not publicly disclose its whistleblowing procedures. The FLI index evaluated "Governance & Accountability" as one of six dimensions, with whistleblowing policy transparency as a key component.
-
-**Why this matters:** The pattern suggests that accountability infrastructure remains underdeveloped across the frontier AI industry, with even basic transparency measures like public whistleblowing policies not yet adopted as industry norms. This reinforces the structural failure of voluntary safety commitments: even the most safety-conscious companies do not adopt basic accountability mechanisms that would impose reputational costs if violated.
-
-## Evidence
-
-From FLI's assessment:
- 1 of 7 frontier AI companies (OpenAI) has published its full whistleblowing policy publicly
- 6 of 7 companies do not publicly disclose whistleblowing procedures
- This includes Anthropic (C+ overall, best performer) which does not publish its policy
- Governance & Accountability was one of six evaluated dimensions
-
---
-
-Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
-
-Topics:
- [[domains/ai-alignment/_map]]
- [[core/grand-strategy/_map]]
--- a/domains/ai-alignment/frontier-ai-dangerous-capability-testing-limited-to-three-companies.md
+++ b/domains/ai-alignment/frontier-ai-dangerous-capability-testing-limited-to-three-companies.md
@ -1,38 +0,0 @@
---
-type: claim
-domain: ai-alignment
-description: "Only Anthropic, OpenAI, and DeepMind conduct substantive testing for dangerous capabilities among frontier AI developers, per FLI Summer 2025 index"
-confidence: likely
-source: "Future of Life Institute, AI Safety Index Summer 2025, July 2025"
-created: 2026-03-11
-secondary_domains: ["grand-strategy"]
---
-
-# Frontier AI dangerous capability testing is limited to three companies
-
-Future of Life Institute's Summer 2025 evaluation found that only three frontier AI companies—Anthropic, OpenAI, and Google DeepMind—conduct substantive testing for dangerous capabilities such as bioterrorism potential and cyberattack vectors. The remaining four assessed companies (x.AI, Meta, Zhipu AI, DeepSeek) either do not test for these risks or do not disclose such testing.
-
-This concentration of risk assessment creates a bifurcated landscape where the majority of frontier AI development proceeds without systematic evaluation of catastrophic misuse potential. The gap is particularly concerning given that all companies are developing increasingly capable systems, but only a minority are actively probing for dangerous emergent capabilities.
-
-The FLI index evaluated companies across six dimensions, with "Risk Assessment" specifically measuring dangerous capability testing. The three companies conducting such testing still received overall grades of C+ to C-, indicating that even substantive risk assessment does not guarantee comprehensive safety practices.
-
-**Why this matters:** Dangerous capability testing remains a voluntary practice adopted by safety-conscious labs rather than an industry norm, leaving significant capability development unmonitored for catastrophic risk vectors. This pattern reinforces the structural race-to-the-bottom dynamic: companies that invest in expensive risk assessment (bioterrorism, cyberattack modeling) gain no competitive advantage and may face capability delays, while competitors skip these costs entirely.
-
-## Evidence
-
-From FLI's assessment:
- 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive dangerous capability testing
- 4 of 7 companies (x.AI, Meta, Zhipu AI, DeepSeek) do not conduct or disclose such testing
- Testing focuses on bioterrorism potential and cyberattack capabilities
- Even companies conducting testing scored C+ or below overall
-
---
-
-Relevant Notes:
- [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]]
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
-
-Topics:
- [[domains/ai-alignment/_map]]
- [[core/grand-strategy/_map]]
--- a/domains/ai-alignment/no
+++ b/domains/ai-alignment/no
@ -21,7 +21,7 @@ The alignment field has converged on a problem they cannot solve with their curr
 ### Additional Evidence (extend)
 *Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*

-FLI's Summer 2025 AI Safety Index does not evaluate collective intelligence approaches to alignment, confirming this gap. The index assesses six dimensions (Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing) but none capture multi-agent or collective coordination mechanisms. The evaluation framework itself reflects the field's focus on individual company practices rather than coordination infrastructure, missing the collective intelligence dimension entirely. This suggests that even comprehensive industry assessments do not yet recognize collective intelligence infrastructure as a measurable safety dimension.
+FLI's AI Safety Index evaluates companies across 6 dimensions (Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing) but does not include any dimension for multi-agent coordination, collective intelligence approaches, or distributed alignment mechanisms. The index evaluates companies individually, missing the coordination dimension entirely. This absence in the evaluation framework itself reflects that collective intelligence infrastructure is not considered a standard component of AI safety practice at frontier labs.

 ---

--- a/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md
+++ b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md
@ -1,49 +1,37 @@
 ---
 type: claim
 domain: ai-alignment
-description: "All frontier AI companies score D or below in existential safety planning while claiming AGI within a decade, per FLI's Summer 2025 index"
+description: "FLI's Summer 2025 index shows all frontier AI labs score D or below in existential safety planning while claiming AGI within a decade"
 confidence: likely
-source: "Future of Life Institute, AI Safety Index Summer 2025, July 2025"
+source: "Future of Life Institute, AI Safety Index Summer 2025 (2025-07-01)"
 created: 2026-03-11
-depends_on: ["the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it", "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"]
 ---

 # No frontier AI company scores above D in existential safety despite active AGI development

-Future of Life Institute's comprehensive Summer 2025 evaluation of frontier AI companies reveals a stark gap between AGI development claims and existential safety preparation. All seven companies assessed—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—scored D or below in the "Existential Safety" dimension, despite most claiming AGI timelines within a decade.
+Future of Life Institute's comprehensive evaluation of frontier AI companies across 6 safety dimensions reveals a critical gap: every company scored D or below in existential safety planning, despite most claiming AGI development within a decade. The best overall performer, Anthropic, achieved only C+ (2.64/4.0) overall and D in existential safety. The index evaluated 7 companies: Anthropic (C+, 2.64), OpenAI (C, 2.10), Google DeepMind (C-, 1.76), x.AI (D, 1.23), Meta (D, 1.06), Zhipu AI (F, 0.62), and DeepSeek (F, 0.37).

-The best overall performer, Anthropic (C+, 2.64/4.0), still received only a D in existential safety planning. OpenAI scored C overall (2.10) but similarly failed to demonstrate coherent planning for human-level AI safety. The index evaluated six dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, and Information Sharing.
+The six evaluation dimensions were: Risk Assessment (dangerous capability testing), Current Harms (safety benchmarks and robustness), Safety Frameworks (risk management processes), Existential Safety (planning for human-level AI), Governance & Accountability (whistleblowing and oversight), and Information Sharing (transparency on specs and risks).

-**Critical findings:**
+Critical findings:
 - Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities like bioterrorism and cyberattacks
 - Only OpenAI published its full whistleblowing policy publicly
- The reviewer noted: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
- The disconnect between AGI claims and existential safety scores is "deeply disturbing"
+- FLI reviewers noted "none of the companies has anything like a coherent, actionable plan" for human-level AI safety
+- The disconnect between AGI claims and existential safety scores was described as "deeply disturbing"

-This quantifies the race-to-the-bottom dynamic: even the most safety-conscious labs cannot maintain robust existential risk planning while competing on capability development. The absence of regulatory floors allows safety practice divergence to widen as competitive pressure intensifies.
+The methodology was peer-reviewed and based on publicly available information plus email correspondence with developers. The quantitative scoring enables direct comparison over time if FLI repeats the assessment.

-## Company scores (overall letter grade, numeric score)
+## Evidence

- Anthropic: C+ (2.64)
- OpenAI: C (2.10)
- Google DeepMind: C- (1.76)
- x.AI: D (1.23)
- Meta: D (1.06)
- Zhipu AI: F (0.62)
- DeepSeek: F (0.37)
-
-All companies scored D or below specifically in the Existential Safety dimension, which evaluates planning for human-level AI risks.
-
-## Why this matters
-
-The C+ best score and universal D-or-below existential safety scores provide empirical evidence for the structural race-to-the-bottom claim. Even Anthropic, positioned as a safety-focused lab, cannot escape the competitive pressure that prevents coherent existential risk planning across the industry.
+This claim is supported by FLI's systematic evaluation across standardized criteria, providing the first quantitative company-level comparison of AI safety practices. The universal failure to score above D in existential safety, combined with active AGI development programs, provides empirical evidence for the gap between safety rhetoric and practice.

 ---

 Relevant Notes:
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[safe AI development requires building alignment mechanisms before scaling capability]]
+- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — this index provides company-level quantification
+- [[voluntary safety pledges cannot survive competitive pressure]] — confirmed by best company scoring only C+
+- [[safe AI development requires building alignment mechanisms before scaling capability]] — violated by every company assessed
+- [[no research group is building alignment through collective intelligence infrastructure]] — index doesn't evaluate this dimension

 Topics:
 - [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/safe
+++ b/domains/ai-alignment/safe
@ -25,7 +25,7 @@ Anthropic's RSP rollback demonstrates the opposite pattern in practice: the comp
 ### Additional Evidence (confirm)
 *Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*

-FLI's Summer 2025 evaluation confirms this principle is violated by every frontier AI company assessed. All seven companies (Anthropic, OpenAI, DeepMind, x.AI, Meta, Zhipu AI, DeepSeek) scored D or below in existential safety planning while actively developing toward AGI. The reviewer stated: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety. This demonstrates that capability scaling is proceeding without corresponding alignment mechanism development across the industry, with even the best-performing company (Anthropic, C+ overall) failing to develop coherent existential safety plans.
+FLI's index shows all frontier AI companies violate this principle: every company scores D or below in existential safety while actively pursuing AGI development. The universal failure to score above D in existential safety planning, combined with claims of AGI within a decade, demonstrates that capability scaling is proceeding without corresponding alignment infrastructure. FLI reviewers noted the disconnect between AGI timelines and existential safety preparedness as 'deeply disturbing.'

 ---

--- a/domains/ai-alignment/voluntary
+++ b/domains/ai-alignment/voluntary
@ -33,12 +33,6 @@ Anthropic, widely considered the most safety-focused frontier AI lab, rolled bac

 The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that risk management remains 'largely voluntary' as of early 2026. While 12 companies published Frontier AI Safety Frameworks in 2025, these remain voluntary commitments without binding legal requirements. The report notes that 'a small number of regulatory regimes beginning to formalize risk management as legal requirements,' but the dominant governance mode is still voluntary pledges. This provides multi-government institutional confirmation that the structural race-to-the-bottom predicted by the alignment tax is actually occurring—voluntary frameworks are not transitioning to binding requirements at the pace needed to prevent competitive pressure from eroding safety commitments.

-
-### Additional Evidence (confirm)
-*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
-
-FLI's Summer 2025 index shows the structural failure of voluntary safety commitments across frontier AI companies. Best overall score: C+ (Anthropic). All companies scored D or below in existential safety. Only 3 of 7 companies conduct dangerous capability testing. Only 1 of 7 (OpenAI) publicly discloses whistleblowing policies. The absence of regulatory floors allows safety practice divergence to widen, with competitive pressure preventing even safety-focused labs from maintaining comprehensive safety infrastructure. The reviewer described the disconnect between AGI claims and existential safety scores as "deeply disturbing." This demonstrates that companies claiming safety commitments (Anthropic's positioning as a safety lab) do not adopt basic accountability measures (public whistleblowing policies) that would impose reputational costs, confirming that unilateral safety commitments cannot survive competitive pressure.
-
 ---

 Relevant Notes:
--- a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
+++ b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
@ -12,10 +12,10 @@ priority: high
 tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency]
 processed_by: theseus
 processed_date: 2026-03-11
-claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md", "frontier-ai-dangerous-capability-testing-limited-to-three-companies.md", "ai-whistleblowing-policy-transparency-limited-to-single-company.md"]
-enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md"]
+claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md", "absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md"]
+enrichments_applied: ["safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
-extraction_notes: "High-value source providing quantitative company-level evidence for race-to-the-bottom dynamics in AI safety. Three new claims extracted focusing on existential safety gap, dangerous capability testing concentration, and whistleblowing policy transparency. Four enrichments applied to existing claims about alignment tax, voluntary pledges, and collective intelligence gaps. The C+ best score and universal D-or-below existential safety scores provide concrete evidence for structural safety failures under competitive pressure."
+extraction_notes: "High-value extraction. Provides quantitative company-level evidence for multiple existing claims about the race to the bottom, voluntary safety failures, and capability-alignment gaps. Two new claims extracted: the universal D-or-below existential safety scores despite AGI claims, and the widening divergence in safety practices absent regulatory floors. Five enrichments applied to existing alignment claims. The index is significant because it's the first systematic quantitative comparison of frontier AI company safety practices with peer-reviewed methodology."
 ---

 ## Content
@ -71,8 +71,13 @@ EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in


 ## Key Facts
- FLI AI Safety Index Summer 2025 evaluated 7 companies: Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, DeepSeek
- Six evaluation dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
- Company scores: Anthropic C+ (2.64), OpenAI C (2.10), DeepMind C- (1.76), x.AI D (1.23), Meta D (1.06), Zhipu AI F (0.62), DeepSeek F (0.37)
- Methodology: peer-reviewed, based on publicly available information plus email correspondence with developers
- Published: July 1, 2025
+- Anthropic: C+ (2.64/4.0) — best overall score
+- OpenAI: C (2.10/4.0) — second
+- Google DeepMind: C- (1.76/4.0) — third
+- x.AI: D (1.23/4.0)
+- Meta: D (1.06/4.0)
+- Zhipu AI: F (0.62/4.0)
+- DeepSeek: F (0.37/4.0)
+- Only 3 of 7 companies conduct substantive dangerous capability testing (bioterrorism, cyberattacks)
+- Only OpenAI published its full whistleblowing policy publicly
+- FLI methodology was peer-reviewed and based on publicly available information plus email correspondence