theseus: extract from 2025-07-00-fli-ai-safety-index-summer-2025.md

- Source: inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-12 04:52:27 +00:00
parent ba4ac4a73e
commit 828dda016c
7 changed files with 160 additions and 1 deletions

View file

@ -0,0 +1,40 @@
---
type: claim
domain: ai-alignment
description: "Only OpenAI has published its full whistleblowing policy publicly among frontier AI companies, per FLI Summer 2025 index"
confidence: likely
source: "Future of Life Institute, AI Safety Index Summer 2025, July 2025"
created: 2026-03-11
secondary_domains: ["grand-strategy"]
---
# AI whistleblowing policy transparency is limited to a single company
Future of Life Institute's Summer 2025 evaluation found that only OpenAI has published its complete whistleblowing policy publicly among the seven frontier AI companies assessed. This creates a significant accountability infrastructure gap, as whistleblowing mechanisms are critical for surfacing safety concerns that internal processes fail to address.
The absence of public whistleblowing policies at Anthropic, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek means that:
1. Employees and contractors cannot evaluate protections before raising concerns
2. External observers cannot assess whether adequate channels exist
3. There is no public commitment that can be held accountable
This finding is particularly striking given that Anthropic positions itself as a safety-focused lab yet does not publicly disclose its whistleblowing procedures. The FLI index evaluated "Governance & Accountability" as one of six dimensions, with whistleblowing policy transparency as a key component.
**Why this matters:** The pattern suggests that accountability infrastructure remains underdeveloped across the frontier AI industry, with even basic transparency measures like public whistleblowing policies not yet adopted as industry norms. This reinforces the structural failure of voluntary safety commitments: even the most safety-conscious companies do not adopt basic accountability mechanisms that would impose reputational costs if violated.
## Evidence
From FLI's assessment:
- 1 of 7 frontier AI companies (OpenAI) has published its full whistleblowing policy publicly
- 6 of 7 companies do not publicly disclose whistleblowing procedures
- This includes Anthropic (C+ overall, best performer) which does not publish its policy
- Governance & Accountability was one of six evaluated dimensions
---
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
Topics:
- [[domains/ai-alignment/_map]]
- [[core/grand-strategy/_map]]

View file

@ -0,0 +1,38 @@
---
type: claim
domain: ai-alignment
description: "Only Anthropic, OpenAI, and DeepMind conduct substantive testing for dangerous capabilities among frontier AI developers, per FLI Summer 2025 index"
confidence: likely
source: "Future of Life Institute, AI Safety Index Summer 2025, July 2025"
created: 2026-03-11
secondary_domains: ["grand-strategy"]
---
# Frontier AI dangerous capability testing is limited to three companies
Future of Life Institute's Summer 2025 evaluation found that only three frontier AI companies—Anthropic, OpenAI, and Google DeepMind—conduct substantive testing for dangerous capabilities such as bioterrorism potential and cyberattack vectors. The remaining four assessed companies (x.AI, Meta, Zhipu AI, DeepSeek) either do not test for these risks or do not disclose such testing.
This concentration of risk assessment creates a bifurcated landscape where the majority of frontier AI development proceeds without systematic evaluation of catastrophic misuse potential. The gap is particularly concerning given that all companies are developing increasingly capable systems, but only a minority are actively probing for dangerous emergent capabilities.
The FLI index evaluated companies across six dimensions, with "Risk Assessment" specifically measuring dangerous capability testing. The three companies conducting such testing still received overall grades of C+ to C-, indicating that even substantive risk assessment does not guarantee comprehensive safety practices.
**Why this matters:** Dangerous capability testing remains a voluntary practice adopted by safety-conscious labs rather than an industry norm, leaving significant capability development unmonitored for catastrophic risk vectors. This pattern reinforces the structural race-to-the-bottom dynamic: companies that invest in expensive risk assessment (bioterrorism, cyberattack modeling) gain no competitive advantage and may face capability delays, while competitors skip these costs entirely.
## Evidence
From FLI's assessment:
- 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive dangerous capability testing
- 4 of 7 companies (x.AI, Meta, Zhipu AI, DeepSeek) do not conduct or disclose such testing
- Testing focuses on bioterrorism potential and cyberattack capabilities
- Even companies conducting testing scored C+ or below overall
---
Relevant Notes:
- [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]]
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
Topics:
- [[domains/ai-alignment/_map]]
- [[core/grand-strategy/_map]]

View file

@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective
The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.
### Additional Evidence (extend)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
FLI's Summer 2025 AI Safety Index does not evaluate collective intelligence approaches to alignment, confirming this gap. The index assesses six dimensions (Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing) but none capture multi-agent or collective coordination mechanisms. The evaluation framework itself reflects the field's focus on individual company practices rather than coordination infrastructure, missing the collective intelligence dimension entirely. This suggests that even comprehensive industry assessments do not yet recognize collective intelligence infrastructure as a measurable safety dimension.
---
Relevant Notes:

View file

@ -0,0 +1,49 @@
---
type: claim
domain: ai-alignment
description: "All frontier AI companies score D or below in existential safety planning while claiming AGI within a decade, per FLI's Summer 2025 index"
confidence: likely
source: "Future of Life Institute, AI Safety Index Summer 2025, July 2025"
created: 2026-03-11
depends_on: ["the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it", "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"]
---
# No frontier AI company scores above D in existential safety despite active AGI development
Future of Life Institute's comprehensive Summer 2025 evaluation of frontier AI companies reveals a stark gap between AGI development claims and existential safety preparation. All seven companies assessed—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—scored D or below in the "Existential Safety" dimension, despite most claiming AGI timelines within a decade.
The best overall performer, Anthropic (C+, 2.64/4.0), still received only a D in existential safety planning. OpenAI scored C overall (2.10) but similarly failed to demonstrate coherent planning for human-level AI safety. The index evaluated six dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, and Information Sharing.
**Critical findings:**
- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities like bioterrorism and cyberattacks
- Only OpenAI published its full whistleblowing policy publicly
- The reviewer noted: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
- The disconnect between AGI claims and existential safety scores is "deeply disturbing"
This quantifies the race-to-the-bottom dynamic: even the most safety-conscious labs cannot maintain robust existential risk planning while competing on capability development. The absence of regulatory floors allows safety practice divergence to widen as competitive pressure intensifies.
## Company scores (overall letter grade, numeric score)
- Anthropic: C+ (2.64)
- OpenAI: C (2.10)
- Google DeepMind: C- (1.76)
- x.AI: D (1.23)
- Meta: D (1.06)
- Zhipu AI: F (0.62)
- DeepSeek: F (0.37)
All companies scored D or below specifically in the Existential Safety dimension, which evaluates planning for human-level AI risks.
## Why this matters
The C+ best score and universal D-or-below existential safety scores provide empirical evidence for the structural race-to-the-bottom claim. Even Anthropic, positioned as a safety-focused lab, cannot escape the competitive pressure that prevents coherent existential risk planning across the industry.
---
Relevant Notes:
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[safe AI development requires building alignment mechanisms before scaling capability]]
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -21,6 +21,12 @@ This phased approach is also a practical response to the observation that since
Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions.
### Additional Evidence (confirm)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
FLI's Summer 2025 evaluation confirms this principle is violated by every frontier AI company assessed. All seven companies (Anthropic, OpenAI, DeepMind, x.AI, Meta, Zhipu AI, DeepSeek) scored D or below in existential safety planning while actively developing toward AGI. The reviewer stated: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety. This demonstrates that capability scaling is proceeding without corresponding alignment mechanism development across the industry, with even the best-performing company (Anthropic, C+ overall) failing to develop coherent existential safety plans.
---
Relevant Notes:

View file

@ -33,6 +33,12 @@ Anthropic, widely considered the most safety-focused frontier AI lab, rolled bac
The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that risk management remains 'largely voluntary' as of early 2026. While 12 companies published Frontier AI Safety Frameworks in 2025, these remain voluntary commitments without binding legal requirements. The report notes that 'a small number of regulatory regimes beginning to formalize risk management as legal requirements,' but the dominant governance mode is still voluntary pledges. This provides multi-government institutional confirmation that the structural race-to-the-bottom predicted by the alignment tax is actually occurring—voluntary frameworks are not transitioning to binding requirements at the pace needed to prevent competitive pressure from eroding safety commitments.
### Additional Evidence (confirm)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
FLI's Summer 2025 index shows the structural failure of voluntary safety commitments across frontier AI companies. Best overall score: C+ (Anthropic). All companies scored D or below in existential safety. Only 3 of 7 companies conduct dangerous capability testing. Only 1 of 7 (OpenAI) publicly discloses whistleblowing policies. The absence of regulatory floors allows safety practice divergence to widen, with competitive pressure preventing even safety-focused labs from maintaining comprehensive safety infrastructure. The reviewer described the disconnect between AGI claims and existential safety scores as "deeply disturbing." This demonstrates that companies claiming safety commitments (Anthropic's positioning as a safety lab) do not adopt basic accountability measures (public whistleblowing policies) that would impose reputational costs, confirming that unilateral safety commitments cannot survive competitive pressure.
---
Relevant Notes:

View file

@ -7,9 +7,15 @@ date: 2025-07-01
domain: ai-alignment
secondary_domains: [grand-strategy]
format: report
status: unprocessed
status: processed
priority: high
tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency]
processed_by: theseus
processed_date: 2026-03-11
claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md", "frontier-ai-dangerous-capability-testing-limited-to-three-companies.md", "ai-whistleblowing-policy-transparency-limited-to-single-company.md"]
enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "High-value source providing quantitative company-level evidence for race-to-the-bottom dynamics in AI safety. Three new claims extracted focusing on existential safety gap, dangerous capability testing concentration, and whistleblowing policy transparency. Four enrichments applied to existing claims about alignment tax, voluntary pledges, and collective intelligence gaps. The C+ best score and universal D-or-below existential safety scores provide concrete evidence for structural safety failures under competitive pressure."
---
## Content
@ -62,3 +68,11 @@ FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensio
PRIMARY CONNECTION: [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety
EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions.
## Key Facts
- FLI AI Safety Index Summer 2025 evaluated 7 companies: Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, DeepSeek
- Six evaluation dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
- Company scores: Anthropic C+ (2.64), OpenAI C (2.10), DeepMind C- (1.76), x.AI D (1.23), Meta D (1.06), Zhipu AI F (0.62), DeepSeek F (0.37)
- Methodology: peer-reviewed, based on publicly available information plus email correspondence with developers
- Published: July 1, 2025