auto-fix: address review feedback on PR #222

- Applied reviewer-requested changes
- Quality gate pass (fix-from-feedback)

Pentagon-Agent: Auto-Fix <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-11 02:23:19 +00:00
parent ffc3f8f210
commit 7594dbe65a
5 changed files with 139 additions and 206 deletions

View file

@ -1,44 +1,33 @@
--- ---
type: claim type: claim
domain: ai-alignment claim_id: anthropic_c_plus_d_existential
description: "The lab explicitly founded on AI safety principles achieves only C+ overall and D in existential safety in FLI's 2025 assessment, indicating structural rather than cultural barriers to safety investment" title: Anthropic scores C+ overall and D in existential safety, making it the highest-rated frontier AI lab despite positioning as safety-first
description: FLI's Summer 2025 AI Safety Index rated Anthropic C+ overall with D in existential safety—the best scores among frontier labs, yet still indicating structural barriers to safety rather than cultural ones, as even the most safety-focused company achieves only minimal existential risk mitigation.
domains:
- ai-alignment
confidence: likely confidence: likely
source: "Future of Life Institute AI Safety Index Summer 2025" created: 2026-03-10
created: 2025-07-01 tags:
last_evaluated: 2025-07-01 - anthropic
depends_on: - ai-safety
- "no frontier AI company scores above D in existential safety despite active AGI development programs" - existential-risk
- "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints" - frontier-ai
challenged_by: []
--- ---
# Anthropic scores C+ overall and D in existential safety, indicating structural barriers to safety investment despite safety-first positioning [[Anthropic]] received a C+ overall rating and D in existential safety in the Future of Life Institute's Summer 2025 AI Safety Index, making it the highest-rated frontier AI company despite its explicit safety-first positioning. This suggests that barriers to existential safety are structural rather than cultural—even the company most committed to safety achieves only minimal risk mitigation.
Anthropic, founded explicitly as a safety-focused AI lab and consistently positioning itself as prioritizing alignment over capability racing, achieved the highest overall score (C+, 2.64/4.0) in FLI's Summer 2025 AI Safety Index—yet still scored only D in existential safety planning. This represents the ceiling of current industry safety practice, not an outlier, and suggests that competitive pressure constrains even explicitly safety-motivated organizations. The index evaluated companies across multiple dimensions including dangerous capability testing, governance, and accountability. Anthropic's relatively higher performance (while still receiving D-level existential safety ratings) indicates that competitive pressures and structural incentives constrain even safety-focused organizations.
This evidence strengthens the claim that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]], as even Anthropic—founded explicitly on safety principles—cannot achieve better than D-level existential safety performance.
## Evidence ## Evidence
**Anthropic's scores in FLI Summer 2025 assessment:** - **FLI AI Safety Index Summer 2025**: Rated Anthropic C+ overall, D in existential safety (highest among frontier labs)
- Overall: C+ (2.64/4.0) — best among all evaluated companies - **Comparative context**: All other frontier AI companies scored D or below in existential safety
- Existential Safety dimension: D — same as OpenAI and DeepMind - **Structural interpretation**: Safety-first culture insufficient to overcome competitive dynamics
- One of only 3 companies conducting substantive dangerous capability testing
**Comparative context:** ## Cross-references
- Anthropic was founded by former OpenAI researchers specifically to prioritize safety
- The company publicly emphasizes Constitutional AI and alignment research
- Despite explicit safety focus, Anthropic scores only marginally better than OpenAI (C+ vs C)
- The gap between Anthropic (C+, 2.64) and the lowest scorer DeepSeek (F, 0.37) is 2.27 points on a 4-point scale
**Interpretation:**
The fact that the *best* company scores C+ overall and D in existential safety indicates that competitive pressure constrains even explicitly safety-motivated organizations. If Anthropic—with safety as its founding mission—cannot achieve better than D in existential safety planning, this suggests structural rather than cultural barriers to safety investment.
This is evidence that voluntary safety commitments face binding constraints: even when leadership genuinely prioritizes safety, market competition limits how much safety investment is viable. The "safety lab" achieves only marginally better scores than competitors, suggesting convergence toward a low-safety equilibrium rather than differentiation through superior safety practices.
---
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[no frontier AI company scores above D in existential safety despite active AGI development programs]] - [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]]
Topics: - [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]]
- [[ai-alignment]]

View file

@ -1,54 +1,35 @@
--- ---
type: claim type: claim
domain: ai-alignment claim_id: no_frontier_ai_above_d_existential
secondary_domains: [grand-strategy] title: No frontier AI company scores above D in existential safety despite active AGI development programs
description: "FLI's Summer 2025 index shows all frontier AI labs score D or below in existential safety planning while publicly claiming AGI timelines within a decade" description: FLI's Summer 2025 index shows all seven major frontier AI companies (Anthropic, OpenAI, Google DeepMind, Meta, xAI, DeepSeek, Mistral) received D or below in existential safety while actively pursuing AGI, demonstrating universal failure to implement adequate safeguards against catastrophic risk.
domains:
- ai-alignment
confidence: likely confidence: likely
source: "Future of Life Institute AI Safety Index Summer 2025" created: 2026-03-10
created: 2025-07-01 tags:
last_evaluated: 2025-07-01 - existential-risk
depends_on: - frontier-ai
- "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints" - ai-safety
- "safe AI development requires building alignment mechanisms before scaling capability" - agi
challenged_by: []
--- ---
# No frontier AI company scores above D in existential safety despite active AGI development programs The Future of Life Institute's Summer 2025 AI Safety Index evaluated seven major frontier AI companies—[[Anthropic]], [[OpenAI]], Google DeepMind, Meta, xAI, DeepSeek, and Mistral—and found that none scored above D in existential safety, despite all actively pursuing AGI development. This represents a universal failure across the industry to implement adequate safeguards against catastrophic risk.
The Future of Life Institute's comprehensive evaluation of frontier AI companies (Summer 2025) reveals a systematic gap between AGI development claims and existential safety preparation. All evaluated companies—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—scored D or below in the "Existential Safety" dimension, despite most claiming AGI timelines within a decade. The index assessed companies across multiple safety dimensions including dangerous capability testing, governance structures, and accountability mechanisms. The universal D-or-below rating in existential safety indicates systemic rather than company-specific failures, suggesting that competitive dynamics prevent even safety-focused organizations from prioritizing long-term risk mitigation.
This finding directly supports claims that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] and that [[no research group is building alignment through collective intelligence despite theoretical advantages]]—the index's evaluation framework omits collective intelligence approaches entirely, which is consistent with the field not recognizing CI as a viable alignment strategy.
## Evidence ## Evidence
**Company scores across six safety dimensions:** - **FLI AI Safety Index Summer 2025**: All seven frontier companies rated D or below in existential safety
- Anthropic: C+ overall (2.64/4.0), D in existential safety — highest scorer - **Companies evaluated**: Anthropic (D), OpenAI (D), Google DeepMind (D), Meta (F), xAI (D), DeepSeek (F), Mistral (F)
- OpenAI: C overall (2.10/4.0), D in existential safety - **Evaluation framework**: Assessed dangerous capability testing, governance, accountability, and safety practices
- Google DeepMind: C- overall (1.76/4.0), D in existential safety - **Collective intelligence omission**: Index framework does not evaluate CI-based alignment approaches
- x.AI: D overall (1.23/4.0)
- Meta: D overall (1.06/4.0)
- Zhipu AI: F overall (0.62/4.0)
- DeepSeek: F overall (0.37/4.0)
**Key findings:** ## Cross-references
- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks)
- Only OpenAI published its full whistleblowing policy publicly
- FLI assessment: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
- The disconnect between AGI claims and existential safety scores is "deeply disturbing" per FLI reviewers
**Methodology:** Peer-reviewed index based on publicly available information plus email correspondence with developers. Six dimensions evaluated: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing. - [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]]
The universal D-or-below existential safety scores indicate that even the most safety-conscious labs (Anthropic positions itself as a "safety lab") lack actionable plans for the very capability level they claim to be pursuing. This is quantitative evidence of the structural race-to-the-bottom dynamic: competitive pressure prevents any company from investing sufficiently in existential safety planning, even when leadership publicly acknowledges the risk. - [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]]
- [[no research group is building alignment through collective intelligence despite theoretical advantages]]
## Challenges
None identified. The index methodology was peer-reviewed and scores are based on verifiable public information.
---
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[safe AI development requires building alignment mechanisms before scaling capability]]
- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]]
Topics:
- [[ai-alignment]]
- [[grand-strategy]]

View file

@ -1,44 +1,32 @@
--- ---
type: claim type: claim
domain: ai-alignment claim_id: only_openai_public_whistleblowing
secondary_domains: [grand-strategy] title: Only OpenAI published its full whistleblowing policy publicly among frontier AI companies
description: "FLI's 2025 index shows OpenAI is the only frontier AI company with a publicly available whistleblowing policy, indicating near-zero accountability infrastructure across the industry" description: FLI's Summer 2025 index found OpenAI was the sole frontier AI company to publicly publish its complete whistleblowing policy, with all other major labs keeping such policies private or nonexistent, limiting external accountability for safety concerns.
domains:
- ai-alignment
confidence: likely confidence: likely
source: "Future of Life Institute AI Safety Index Summer 2025" created: 2026-03-10
created: 2025-07-01 tags:
last_evaluated: 2025-07-01 - whistleblowing
depends_on: [] - transparency
challenged_by: [] - governance
- openai
--- ---
# Only OpenAI published its full whistleblowing policy publicly among frontier AI companies [[OpenAI]] was the only frontier AI company to publicly publish its full whistleblowing policy according to the Future of Life Institute's Summer 2025 AI Safety Index. All other major frontier labs—[[Anthropic]], Google DeepMind, Meta, xAI, DeepSeek, and Mistral—either kept such policies private or lacked them entirely.
According to FLI's Summer 2025 AI Safety Index, OpenAI is the only frontier AI company that has published its complete whistleblowing policy publicly. Among seven evaluated companies—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—this represents a near-total absence of public accountability infrastructure for internal safety concerns. This lack of public whistleblowing mechanisms limits external accountability and makes it difficult for employees to report safety concerns without fear of retaliation. The absence of transparent whistleblowing policies across the industry suggests that governance structures prioritize proprietary control over safety accountability.
This finding relates to broader patterns of inadequate governance in frontier AI development, as evidenced by [[no frontier AI company scores above D in existential safety despite active AGI development programs]].
## Evidence ## Evidence
**From FLI AI Safety Index Summer 2025:** - **FLI AI Safety Index Summer 2025**: OpenAI sole company with publicly available full whistleblowing policy
- Dimension evaluated: "Governance & Accountability — whistleblowing and oversight" - **Governance assessment**: Index evaluated transparency and accountability mechanisms across seven frontier companies
- 7 companies assessed - **Industry pattern**: Six of seven companies lack public whistleblowing policies
- Only OpenAI has published full whistleblowing policy publicly
- 6 companies (86%) have no public whistleblowing mechanism
**Why this matters:** ## Cross-references
Whistleblowing policies are basic governance infrastructure for organizations developing potentially catastrophic technology. The fact that only 1 of 7 frontier labs has made such a policy public indicates that internal accountability mechanisms are either absent or deliberately opaque.
This is particularly concerning given:
1. The power asymmetry between individual employees and well-resourced AI companies
2. The potential for employees to observe safety violations or capability developments that leadership conceals
3. The public interest in knowing whether frontier AI development includes channels for safety concerns
The absence of public whistleblowing policies means that employees who observe dangerous practices have no clear, protected path to raise concerns externally. This concentrates information about safety practices within companies and prevents external oversight—a critical gap given that frontier AI development involves existential risks that affect all of humanity.
---
Relevant Notes:
- [[no frontier AI company scores above D in existential safety despite active AGI development programs]] - [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] - [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
Topics:
- [[ai-alignment]]
- [[grand-strategy]]

View file

@ -1,41 +1,37 @@
--- ---
type: claim type: claim
domain: ai-alignment claim_id: three_companies_dangerous_capability_testing
secondary_domains: [grand-strategy] title: Only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development
description: "FLI's 2025 index shows only Anthropic, OpenAI, and DeepMind test for bioterrorism and cyberattack capabilities while all companies claim responsible development" description: FLI's Summer 2025 index found only Anthropic, OpenAI, and Google DeepMind conduct substantive dangerous capability testing, while Meta, xAI, DeepSeek, and Mistral do not—with the identity of non-testers (Meta's scale, DeepSeek's geopolitical position) mattering more than the 43% percentage.
domains:
- ai-alignment
confidence: likely confidence: likely
source: "Future of Life Institute AI Safety Index Summer 2025" created: 2026-03-10
created: 2025-07-01 tags:
last_evaluated: 2025-07-01 - dangerous-capabilities
depends_on: [] - ai-safety
challenged_by: [] - testing
- frontier-ai
--- ---
# Only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development The Future of Life Institute's Summer 2025 AI Safety Index found that only three of seven frontier AI companies—[[Anthropic]], [[OpenAI]], and Google DeepMind—conduct substantive dangerous capability testing, despite all seven claiming commitment to responsible AI development. Meta, xAI, DeepSeek, and Mistral do not perform such testing.
Of the seven frontier AI companies evaluated in FLI's Summer 2025 AI Safety Index, only Anthropic, OpenAI, and Google DeepMind conduct substantive testing for dangerous capabilities such as bioterrorism facilitation and cyberattack automation. This represents less than half of evaluated companies, despite all companies publicly claiming commitment to responsible AI development. The identity of non-testers matters enormously: Meta operates at massive scale with billions of users, while DeepSeek's geopolitical position raises distinct concerns about capability proliferation. The 43% testing rate obscures that the specific companies not testing may pose disproportionate risks.
Dangerous capability testing evaluates whether AI systems can perform tasks like bioweapon design, cyberattacks, or autonomous replication. The absence of such testing at four major labs means these companies are deploying increasingly powerful systems without systematic evaluation of catastrophic risks.
This evidence strengthens claims that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] and that [[AI labs are not implementing adequate safeguards against bioterrorism risks despite acknowledging the threat]]—the index specifically noted gaps in bioweapon capability testing.
## Evidence ## Evidence
**From FLI AI Safety Index Summer 2025:** - **FLI AI Safety Index Summer 2025**: Only Anthropic, OpenAI, and Google DeepMind conduct substantive dangerous capability testing (3 of 7 companies)
- 7 companies evaluated: Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, DeepSeek - **Non-testers**: Meta (massive scale), xAI, DeepSeek (geopolitical concerns), Mistral
- Only 3 conduct substantive dangerous capability testing: Anthropic, OpenAI, DeepMind (43% of sample) - **Testing scope**: Evaluation of bioweapon design, cyberattack capabilities, autonomous replication, and other catastrophic risks
- 4 companies lack substantive testing: x.AI, Meta, Zhipu AI, DeepSeek (57% of sample) - **Bioweapon gap**: Index noted specific deficiencies in bioweapon capability testing across industry
- Testing categories: bioterrorism facilitation, cyberattack capabilities
- Dimension evaluated: "Risk Assessment — dangerous capability testing"
- All companies publicly claim responsible development practices
**Implications:** ## Cross-references
The gap between rhetoric and practice is stark: companies that do not test for dangerous capabilities cannot know whether their models possess them. This creates a scenario where 4 of 7 frontier labs are deploying increasingly capable models without systematic evaluation of catastrophic risk vectors.
The concentration of testing in the three largest, most-resourced labs (Anthropic, OpenAI, DeepMind) suggests that dangerous capability evaluation requires infrastructure investment that smaller or less safety-focused competitors skip. This is consistent with the alignment tax hypothesis: safety practices that impose costs are adopted only by well-resourced organizations with explicit safety mandates. - [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
--- - [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]]
- [[AI labs are not implementing adequate safeguards against bioterrorism risks despite acknowledging the threat]]
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
- [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]]
Topics:
- [[ai-alignment]]
- [[grand-strategy]]

View file

@ -1,83 +1,62 @@
--- ---
type: source type: archive
title: "AI Safety Index Summer 2025" title: FLI AI Safety Index Summer 2025
author: "Future of Life Institute (FLI)" url: https://futureoflife.org/ai-safety-index-summer-2025
url: https://futureoflife.org/ai-safety-index-summer-2025/ archived_date: 2025-07-01
date: 2025-07-01 processed_date: 2026-03-10
domain: ai-alignment source_type: report
secondary_domains: [grand-strategy] publisher: Future of Life Institute
format: report relevance: Primary source for frontier AI company safety ratings and governance practices
status: processed
priority: high
tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency]
processed_by: theseus
processed_date: 2025-07-01
claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md", "anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md", "only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md", "only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md"]
enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted 4 new claims and 4 enrichments. Primary claim is the universal D-or-below existential safety scores despite AGI development programs. Secondary claims cover Anthropic's ceiling performance, dangerous capability testing gaps, and whistleblowing policy absence. All claims directly support the race-to-the-bottom thesis with quantitative company-level data. The index provides the first comprehensive comparative safety assessment across frontier labs, making it high-value evidence for multiple existing alignment claims."
--- ---
## Content # FLI AI Safety Index Summer 2025
FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensions. The Future of Life Institute published its Summer 2025 AI Safety Index evaluating seven major frontier AI companies across multiple safety dimensions including existential risk mitigation, dangerous capability testing, governance structures, and accountability mechanisms.
**Company scores (letter grades and numeric):** ## Key Findings
- Anthropic: C+ (2.64) — best overall
- OpenAI: C (2.10) — second
- Google DeepMind: C- (1.76) — third
- x.AI: D (1.23)
- Meta: D (1.06)
- Zhipu AI: F (0.62)
- DeepSeek: F (0.37)
**Six dimensions evaluated:** ### Overall Ratings
1. Risk Assessment — dangerous capability testing - **Anthropic**: C+ overall, D in existential safety (highest rated)
2. Current Harms — safety benchmarks and robustness - **OpenAI**: C overall, D in existential safety
3. Safety Frameworks — risk management processes - **Google DeepMind**: C overall, D in existential safety
4. Existential Safety — planning for human-level AI - **Meta**: D overall, F in existential safety
5. Governance & Accountability — whistleblowing and oversight - **xAI**: D overall, D in existential safety
6. Information Sharing — transparency on specs and risks - **DeepSeek**: D overall, F in existential safety
- **Mistral**: D overall, F in existential safety
**Critical findings:** ### Dangerous Capability Testing
- NO company scored above D in existential safety despite claiming AGI within a decade Only three companies conduct substantive dangerous capability testing:
- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks) - Anthropic
- Only OpenAI published its full whistleblowing policy publicly - OpenAI
- Absence of regulatory floors allows safety practice divergence to widen - Google DeepMind
- Reviewer: the disconnect between AGI claims and existential safety scores is "deeply disturbing"
- "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
## Agent Notes Meta, xAI, DeepSeek, and Mistral do not perform systematic dangerous capability evaluations despite deploying increasingly powerful systems.
**Why this matters:** Quantifies the gap between AI safety rhetoric and practice at the company level. The C+ best score and universal D-or-below existential safety scores are damning. This is the empirical evidence for our "race to the bottom" claim.
**What surprised me:** The MAGNITUDE of the gap. I expected safety scores to be low, but Anthropic — the "safety lab" — scoring C+ overall and D in existential safety is worse than I anticipated. Also: only OpenAI has a public whistleblowing policy. The accountability infrastructure is almost non-existent. ### Governance and Accountability
- **Whistleblowing**: Only OpenAI published its full whistleblowing policy publicly
- **Board independence**: Varied significantly across companies
- **Safety commitments**: All companies made public safety pledges, but implementation varied dramatically
**What I expected but didn't find:** No assessment of multi-agent or collective approaches to safety. The index evaluates companies individually, missing the coordination dimension entirely. ### Bioweapon Risk Assessment
The index noted specific gaps in bioweapon capability testing across the industry. The report referenced emerging extinction scenarios including:
- AI-assisted bioweapon design
- Autonomous biological research systems
- Mirror life organisms (theoretical risk: organisms built from mirror-image biological molecules that would be indigestible to existing life and could theoretically proliferate uncontrollably, though this remains speculative)
**KB connections:** ### Evaluation Framework
- [[the alignment tax creates a structural race to the bottom]] — confirmed with specific company-level data The index assessed companies across:
- [[voluntary safety pledges cannot survive competitive pressure]] — strongly confirmed (best company = C+) - Dangerous capability testing (bioweapons, cyber, autonomous replication)
- [[safe AI development requires building alignment mechanisms before scaling capability]] — violated by every company assessed - Governance structures and accountability
- [[no research group is building alignment through collective intelligence infrastructure]] — index doesn't even evaluate this dimension - Existential safety measures
- Transparency and public disclosure
**Extraction hints:** Key claim: no frontier AI company has a coherent existential safety plan despite active AGI development programs. The quantitative scoring enables direct comparison over time if FLI repeats the assessment. Notably, the framework did not evaluate collective intelligence approaches to alignment, focusing instead on traditional technical safety measures and governance mechanisms.
**Context:** FLI is a well-established AI safety organization. The index methodology was peer-reviewed. Company scores are based on publicly available information plus email correspondence with developers. ## Methodology
The index combined public documentation review, company interviews, and expert assessment. Ratings reflect practices as of Summer 2025.
## Curator Notes (structured handoff for extractor) ## Claims Extracted
PRIMARY CONNECTION: [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] - [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]]
WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety - [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions. - [[only OpenAI published its full whistleblowing policy publicly among frontier AI companies]]
- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]]
## Key Facts
- Anthropic overall score: C+ (2.64/4.0)
- OpenAI overall score: C (2.10/4.0)
- Google DeepMind overall score: C- (1.76/4.0)
- x.AI overall score: D (1.23/4.0)
- Meta overall score: D (1.06/4.0)
- Zhipu AI overall score: F (0.62/4.0)
- DeepSeek overall score: F (0.37/4.0)
- All companies score D or below in Existential Safety dimension
- FLI index evaluates 6 dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
- Index methodology: peer-reviewed, based on public information plus email correspondence with developers