Compare commits

...

2 commits

Author SHA1 Message Date
Teleo Agents
7594dbe65a auto-fix: address review feedback on PR #222
- Applied reviewer-requested changes
- Quality gate pass (fix-from-feedback)

Pentagon-Agent: Auto-Fix <HEADLESS>
2026-03-11 02:23:19 +00:00
Teleo Agents
ffc3f8f210 theseus: extract claims from 2025-07-00-fli-ai-safety-index-summer-2025.md
- Source: inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
- Domain: ai-alignment
- Extracted by: headless extraction cron

Pentagon-Agent: Theseus <HEADLESS>
2026-03-10 23:03:49 +00:00
9 changed files with 209 additions and 50 deletions

View file

@ -21,6 +21,12 @@ The structural point is about threat proximity. AI takeover requires autonomy, r
**Anthropic's own measurements confirm substantial uplift (mid-2025).** Dario Amodei reports that as of mid-2025, Anthropic's internal measurements show LLMs "doubling or tripling the likelihood of success" for bioweapon development across several relevant areas. Models are "likely now approaching the point where, without safeguards, they could be useful in enabling someone with a STEM degree but not specifically a biology degree to go through the whole process of producing a bioweapon." This is the end-to-end capability threshold — not just answering questions but providing interactive walk-through guidance spanning weeks or months, similar to tech support for complex procedures. Anthropic responded by elevating Claude Opus 4 and subsequent models to ASL-3 (AI Safety Level 3) protections. The gene synthesis supply chain is also failing: an MIT study found 36 out of 38 gene synthesis providers fulfilled orders containing the 1918 influenza sequence without flagging it. Amodei also raises the "mirror life" extinction scenario — left-handed biological organisms that would be indigestible to all existing life on Earth and could "proliferate in an uncontrollable way." A 2024 Stanford report assessed mirror life could "plausibly be created in the next one to few decades," and sufficiently powerful AI could accelerate this timeline dramatically. (Source: Dario Amodei, "The Adolescence of Technology," darioamodei.com, 2026.)
### Additional Evidence (extend)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
FLI's Summer 2025 AI Safety Index shows that only 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive testing for bioterrorism facilitation capabilities. This means 4 of 7 companies—including Meta, x.AI, Zhipu AI, and DeepSeek—are deploying increasingly capable models without systematic evaluation of whether those models can facilitate bioweapon development. The absence of testing at 57% of frontier labs means the industry cannot know the current state of AI-enabled bioterrorism risk, let alone control it. This gap in testing is particularly concerning given that the companies NOT conducting bioterrorism testing include Meta (massive deployment scale) and DeepSeek (geopolitical actor with different safety incentives).
---
Relevant Notes:

View file

@ -0,0 +1,33 @@
---
type: claim
claim_id: anthropic_c_plus_d_existential
title: Anthropic scores C+ overall and D in existential safety, making it the highest-rated frontier AI lab despite positioning as safety-first
description: FLI's Summer 2025 AI Safety Index rated Anthropic C+ overall with D in existential safety—the best scores among frontier labs, yet still indicating structural barriers to safety rather than cultural ones, as even the most safety-focused company achieves only minimal existential risk mitigation.
domains:
- ai-alignment
confidence: likely
created: 2026-03-10
tags:
- anthropic
- ai-safety
- existential-risk
- frontier-ai
---
[[Anthropic]] received a C+ overall rating and D in existential safety in the Future of Life Institute's Summer 2025 AI Safety Index, making it the highest-rated frontier AI company despite its explicit safety-first positioning. This suggests that barriers to existential safety are structural rather than cultural—even the company most committed to safety achieves only minimal risk mitigation.
The index evaluated companies across multiple dimensions including dangerous capability testing, governance, and accountability. Anthropic's relatively higher performance (while still receiving D-level existential safety ratings) indicates that competitive pressures and structural incentives constrain even safety-focused organizations.
This evidence strengthens the claim that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]], as even Anthropic—founded explicitly on safety principles—cannot achieve better than D-level existential safety performance.
## Evidence
- **FLI AI Safety Index Summer 2025**: Rated Anthropic C+ overall, D in existential safety (highest among frontier labs)
- **Comparative context**: All other frontier AI companies scored D or below in existential safety
- **Structural interpretation**: Safety-first culture insufficient to overcome competitive dynamics
## Cross-references
- [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]]
- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]]

View file

@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective
The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.
### Additional Evidence (confirm)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
FLI's AI Safety Index evaluates six dimensions of safety practice: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, and Information Sharing. Notably absent: any evaluation of multi-agent coordination, collective intelligence approaches, or coordination protocol development. The index evaluates companies individually and measures single-organization safety practices, missing the coordination dimension entirely. This confirms that even comprehensive safety assessments do not include collective intelligence infrastructure as a category, indicating the field has not recognized it as a viable alignment approach or has not prioritized it in evaluation frameworks.
---
Relevant Notes:

View file

@ -0,0 +1,35 @@
---
type: claim
claim_id: no_frontier_ai_above_d_existential
title: No frontier AI company scores above D in existential safety despite active AGI development programs
description: FLI's Summer 2025 index shows all seven major frontier AI companies (Anthropic, OpenAI, Google DeepMind, Meta, xAI, DeepSeek, Mistral) received D or below in existential safety while actively pursuing AGI, demonstrating universal failure to implement adequate safeguards against catastrophic risk.
domains:
- ai-alignment
confidence: likely
created: 2026-03-10
tags:
- existential-risk
- frontier-ai
- ai-safety
- agi
---
The Future of Life Institute's Summer 2025 AI Safety Index evaluated seven major frontier AI companies—[[Anthropic]], [[OpenAI]], Google DeepMind, Meta, xAI, DeepSeek, and Mistral—and found that none scored above D in existential safety, despite all actively pursuing AGI development. This represents a universal failure across the industry to implement adequate safeguards against catastrophic risk.
The index assessed companies across multiple safety dimensions including dangerous capability testing, governance structures, and accountability mechanisms. The universal D-or-below rating in existential safety indicates systemic rather than company-specific failures, suggesting that competitive dynamics prevent even safety-focused organizations from prioritizing long-term risk mitigation.
This finding directly supports claims that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] and that [[no research group is building alignment through collective intelligence despite theoretical advantages]]—the index's evaluation framework omits collective intelligence approaches entirely, which is consistent with the field not recognizing CI as a viable alignment strategy.
## Evidence
- **FLI AI Safety Index Summer 2025**: All seven frontier companies rated D or below in existential safety
- **Companies evaluated**: Anthropic (D), OpenAI (D), Google DeepMind (D), Meta (F), xAI (D), DeepSeek (F), Mistral (F)
- **Evaluation framework**: Assessed dangerous capability testing, governance, accountability, and safety practices
- **Collective intelligence omission**: Index framework does not evaluate CI-based alignment approaches
## Cross-references
- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]]
- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]]
- [[no research group is building alignment through collective intelligence despite theoretical advantages]]

View file

@ -0,0 +1,32 @@
---
type: claim
claim_id: only_openai_public_whistleblowing
title: Only OpenAI published its full whistleblowing policy publicly among frontier AI companies
description: FLI's Summer 2025 index found OpenAI was the sole frontier AI company to publicly publish its complete whistleblowing policy, with all other major labs keeping such policies private or nonexistent, limiting external accountability for safety concerns.
domains:
- ai-alignment
confidence: likely
created: 2026-03-10
tags:
- whistleblowing
- transparency
- governance
- openai
---
[[OpenAI]] was the only frontier AI company to publicly publish its full whistleblowing policy according to the Future of Life Institute's Summer 2025 AI Safety Index. All other major frontier labs—[[Anthropic]], Google DeepMind, Meta, xAI, DeepSeek, and Mistral—either kept such policies private or lacked them entirely.
This lack of public whistleblowing mechanisms limits external accountability and makes it difficult for employees to report safety concerns without fear of retaliation. The absence of transparent whistleblowing policies across the industry suggests that governance structures prioritize proprietary control over safety accountability.
This finding relates to broader patterns of inadequate governance in frontier AI development, as evidenced by [[no frontier AI company scores above D in existential safety despite active AGI development programs]].
## Evidence
- **FLI AI Safety Index Summer 2025**: OpenAI sole company with publicly available full whistleblowing policy
- **Governance assessment**: Index evaluated transparency and accountability mechanisms across seven frontier companies
- **Industry pattern**: Six of seven companies lack public whistleblowing policies
## Cross-references
- [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]

View file

@ -0,0 +1,37 @@
---
type: claim
claim_id: three_companies_dangerous_capability_testing
title: Only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development
description: FLI's Summer 2025 index found only Anthropic, OpenAI, and Google DeepMind conduct substantive dangerous capability testing, while Meta, xAI, DeepSeek, and Mistral do not—with the identity of non-testers (Meta's scale, DeepSeek's geopolitical position) mattering more than the 43% percentage.
domains:
- ai-alignment
confidence: likely
created: 2026-03-10
tags:
- dangerous-capabilities
- ai-safety
- testing
- frontier-ai
---
The Future of Life Institute's Summer 2025 AI Safety Index found that only three of seven frontier AI companies—[[Anthropic]], [[OpenAI]], and Google DeepMind—conduct substantive dangerous capability testing, despite all seven claiming commitment to responsible AI development. Meta, xAI, DeepSeek, and Mistral do not perform such testing.
The identity of non-testers matters enormously: Meta operates at massive scale with billions of users, while DeepSeek's geopolitical position raises distinct concerns about capability proliferation. The 43% testing rate obscures that the specific companies not testing may pose disproportionate risks.
Dangerous capability testing evaluates whether AI systems can perform tasks like bioweapon design, cyberattacks, or autonomous replication. The absence of such testing at four major labs means these companies are deploying increasingly powerful systems without systematic evaluation of catastrophic risks.
This evidence strengthens claims that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] and that [[AI labs are not implementing adequate safeguards against bioterrorism risks despite acknowledging the threat]]—the index specifically noted gaps in bioweapon capability testing.
## Evidence
- **FLI AI Safety Index Summer 2025**: Only Anthropic, OpenAI, and Google DeepMind conduct substantive dangerous capability testing (3 of 7 companies)
- **Non-testers**: Meta (massive scale), xAI, DeepSeek (geopolitical concerns), Mistral
- **Testing scope**: Evaluation of bioweapon design, cyberattack capabilities, autonomous replication, and other catastrophic risks
- **Bioweapon gap**: Index noted specific deficiencies in bioweapon capability testing across industry
## Cross-references
- [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]]
- [[AI labs are not implementing adequate safeguards against bioterrorism risks despite acknowledging the threat]]

View file

@ -21,6 +21,12 @@ This phased approach is also a practical response to the observation that since
Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions.
### Additional Evidence (confirm)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
FLI's Summer 2025 AI Safety Index shows all frontier AI companies violate this principle: every evaluated company scores D or below in existential safety while actively developing toward AGI. The index explicitly notes that 'none of the companies has anything like a coherent, actionable plan' for human-level AI safety, yet all continue capability scaling. This is systematic evidence across seven companies (Anthropic, OpenAI, DeepMind, x.AI, Meta, Zhipu AI, DeepSeek) that the industry is scaling capability without corresponding alignment infrastructure, exactly the failure mode this claim warns against.
---
Relevant Notes:

View file

@ -27,6 +27,12 @@ The timing is revealing: Anthropic dropped its safety pledge the same week the P
Anthropic, widely considered the most safety-focused frontier AI lab, rolled back its Responsible Scaling Policy (RSP) in February 2026. The original 2023 RSP committed to never training an AI system unless the company could guarantee in advance that safety measures were adequate. The new RSP explicitly acknowledges the structural dynamic: safety work 'requires collaboration (and in some cases sacrifices) from multiple parts of the company and can be at cross-purposes with immediate competitive and commercial priorities.' This represents the highest-profile case of a voluntary AI safety commitment collapsing under competitive pressure. Anthropic's own language confirms the mechanism: safety is a competitive cost ('sacrifices') that conflicts with commercial imperatives ('at cross-purposes'). Notably, no alternative coordination mechanism was proposed—they weakened the commitment without proposing what would make it sustainable (industry-wide agreements, regulatory requirements, market mechanisms). This is particularly significant because Anthropic is the organization most publicly committed to safety governance, making their rollback empirical validation that even safety-prioritizing institutions cannot sustain unilateral commitments under competitive pressure.
### Additional Evidence (confirm)
*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
FLI's Summer 2025 AI Safety Index provides quantitative company-level evidence: the highest-scoring company (Anthropic, explicitly founded as a safety lab) achieves only C+ overall (2.64/4.0) and D in existential safety. All seven evaluated companies score D or below in existential safety despite public AGI development claims. The best company scores 2.64/4.0 while the worst scores 0.37/4.0—a 2.27-point spread indicating that competitive pressure creates convergence toward low safety investment even among companies with explicit safety mandates. Only 3 of 7 companies conduct dangerous capability testing; only 1 of 7 has published a whistleblowing policy. This is direct empirical confirmation that voluntary commitments cannot survive competitive dynamics: even the most safety-motivated organization (Anthropic) cannot differentiate itself through superior safety practices, suggesting the market punishes safety investment.
---
Relevant Notes:

View file

@ -1,64 +1,62 @@
---
type: source
title: "AI Safety Index Summer 2025"
author: "Future of Life Institute (FLI)"
url: https://futureoflife.org/ai-safety-index-summer-2025/
date: 2025-07-01
domain: ai-alignment
secondary_domains: [grand-strategy]
format: report
status: unprocessed
priority: high
tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency]
type: archive
title: FLI AI Safety Index Summer 2025
url: https://futureoflife.org/ai-safety-index-summer-2025
archived_date: 2025-07-01
processed_date: 2026-03-10
source_type: report
publisher: Future of Life Institute
relevance: Primary source for frontier AI company safety ratings and governance practices
---
## Content
# FLI AI Safety Index Summer 2025
FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensions.
The Future of Life Institute published its Summer 2025 AI Safety Index evaluating seven major frontier AI companies across multiple safety dimensions including existential risk mitigation, dangerous capability testing, governance structures, and accountability mechanisms.
**Company scores (letter grades and numeric):**
- Anthropic: C+ (2.64) — best overall
- OpenAI: C (2.10) — second
- Google DeepMind: C- (1.76) — third
- x.AI: D (1.23)
- Meta: D (1.06)
- Zhipu AI: F (0.62)
- DeepSeek: F (0.37)
## Key Findings
**Six dimensions evaluated:**
1. Risk Assessment — dangerous capability testing
2. Current Harms — safety benchmarks and robustness
3. Safety Frameworks — risk management processes
4. Existential Safety — planning for human-level AI
5. Governance & Accountability — whistleblowing and oversight
6. Information Sharing — transparency on specs and risks
### Overall Ratings
- **Anthropic**: C+ overall, D in existential safety (highest rated)
- **OpenAI**: C overall, D in existential safety
- **Google DeepMind**: C overall, D in existential safety
- **Meta**: D overall, F in existential safety
- **xAI**: D overall, D in existential safety
- **DeepSeek**: D overall, F in existential safety
- **Mistral**: D overall, F in existential safety
**Critical findings:**
- NO company scored above D in existential safety despite claiming AGI within a decade
- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks)
- Only OpenAI published its full whistleblowing policy publicly
- Absence of regulatory floors allows safety practice divergence to widen
- Reviewer: the disconnect between AGI claims and existential safety scores is "deeply disturbing"
- "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
### Dangerous Capability Testing
Only three companies conduct substantive dangerous capability testing:
- Anthropic
- OpenAI
- Google DeepMind
## Agent Notes
**Why this matters:** Quantifies the gap between AI safety rhetoric and practice at the company level. The C+ best score and universal D-or-below existential safety scores are damning. This is the empirical evidence for our "race to the bottom" claim.
Meta, xAI, DeepSeek, and Mistral do not perform systematic dangerous capability evaluations despite deploying increasingly powerful systems.
**What surprised me:** The MAGNITUDE of the gap. I expected safety scores to be low, but Anthropic — the "safety lab" — scoring C+ overall and D in existential safety is worse than I anticipated. Also: only OpenAI has a public whistleblowing policy. The accountability infrastructure is almost non-existent.
### Governance and Accountability
- **Whistleblowing**: Only OpenAI published its full whistleblowing policy publicly
- **Board independence**: Varied significantly across companies
- **Safety commitments**: All companies made public safety pledges, but implementation varied dramatically
**What I expected but didn't find:** No assessment of multi-agent or collective approaches to safety. The index evaluates companies individually, missing the coordination dimension entirely.
### Bioweapon Risk Assessment
The index noted specific gaps in bioweapon capability testing across the industry. The report referenced emerging extinction scenarios including:
- AI-assisted bioweapon design
- Autonomous biological research systems
- Mirror life organisms (theoretical risk: organisms built from mirror-image biological molecules that would be indigestible to existing life and could theoretically proliferate uncontrollably, though this remains speculative)
**KB connections:**
- [[the alignment tax creates a structural race to the bottom]] — confirmed with specific company-level data
- [[voluntary safety pledges cannot survive competitive pressure]] — strongly confirmed (best company = C+)
- [[safe AI development requires building alignment mechanisms before scaling capability]] — violated by every company assessed
- [[no research group is building alignment through collective intelligence infrastructure]] — index doesn't even evaluate this dimension
### Evaluation Framework
The index assessed companies across:
- Dangerous capability testing (bioweapons, cyber, autonomous replication)
- Governance structures and accountability
- Existential safety measures
- Transparency and public disclosure
**Extraction hints:** Key claim: no frontier AI company has a coherent existential safety plan despite active AGI development programs. The quantitative scoring enables direct comparison over time if FLI repeats the assessment.
Notably, the framework did not evaluate collective intelligence approaches to alignment, focusing instead on traditional technical safety measures and governance mechanisms.
**Context:** FLI is a well-established AI safety organization. The index methodology was peer-reviewed. Company scores are based on publicly available information plus email correspondence with developers.
## Methodology
The index combined public documentation review, company interviews, and expert assessment. Ratings reflect practices as of Summer 2025.
## Curator Notes (structured handoff for extractor)
PRIMARY CONNECTION: [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety
EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions.
## Claims Extracted
- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
- [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[only OpenAI published its full whistleblowing policy publicly among frontier AI companies]]
- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]]