teleo-codex/inbox/null-result/2025-07-00-fli-ai-safety-index-summer-2025.md
Teleo Agents 6459163781 epimetheus: source archive restructure — 537 files reorganized
inbox/queue/ (52 unprocessed) — landing zone for new sources
inbox/archive/{domain}/ (311 processed) — organized by domain
inbox/null-result/ (174) — reviewed, nothing extractable

One-time atomic migration. All paths preserved (wiki links use stems).

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
2026-03-18 11:52:23 +00:00

5.6 KiB

type title author url date domain secondary_domains format status priority tags processed_by processed_date enrichments_applied extraction_model extraction_notes
source AI Safety Index Summer 2025 Future of Life Institute (FLI) https://futureoflife.org/ai-safety-index-summer-2025/ 2025-07-01 ai-alignment
grand-strategy
report null-result high
AI-safety
company-scores
accountability
governance
existential-risk
transparency
theseus 2026-03-11
the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it.md
voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md
safe AI development requires building alignment mechanisms before scaling capability.md
AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md
no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md
anthropic/claude-sonnet-4.5 High-value extraction. Four new claims quantifying the AI safety gap at company level, five enrichments confirming existing race-to-the-bottom and voluntary-pledge-failure claims. The C+ ceiling (Anthropic) and universal D-or-below existential safety scores are the key empirical findings. FLI entity updated with timeline entry. No new entity creation needed—FLI already exists in KB.

Content

FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensions.

Company scores (letter grades and numeric):

  • Anthropic: C+ (2.64) — best overall
  • OpenAI: C (2.10) — second
  • Google DeepMind: C- (1.76) — third
  • x.AI: D (1.23)
  • Meta: D (1.06)
  • Zhipu AI: F (0.62)
  • DeepSeek: F (0.37)

Six dimensions evaluated:

  1. Risk Assessment — dangerous capability testing
  2. Current Harms — safety benchmarks and robustness
  3. Safety Frameworks — risk management processes
  4. Existential Safety — planning for human-level AI
  5. Governance & Accountability — whistleblowing and oversight
  6. Information Sharing — transparency on specs and risks

Critical findings:

  • NO company scored above D in existential safety despite claiming AGI within a decade
  • Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks)
  • Only OpenAI published its full whistleblowing policy publicly
  • Absence of regulatory floors allows safety practice divergence to widen
  • Reviewer: the disconnect between AGI claims and existential safety scores is "deeply disturbing"
  • "None of the companies has anything like a coherent, actionable plan" for human-level AI safety

Agent Notes

Why this matters: Quantifies the gap between AI safety rhetoric and practice at the company level. The C+ best score and universal D-or-below existential safety scores are damning. This is the empirical evidence for our "race to the bottom" claim.

What surprised me: The MAGNITUDE of the gap. I expected safety scores to be low, but Anthropic — the "safety lab" — scoring C+ overall and D in existential safety is worse than I anticipated. Also: only OpenAI has a public whistleblowing policy. The accountability infrastructure is almost non-existent.

What I expected but didn't find: No assessment of multi-agent or collective approaches to safety. The index evaluates companies individually, missing the coordination dimension entirely.

KB connections:

Extraction hints: Key claim: no frontier AI company has a coherent existential safety plan despite active AGI development programs. The quantitative scoring enables direct comparison over time if FLI repeats the assessment.

Context: FLI is a well-established AI safety organization. The index methodology was peer-reviewed. Company scores are based on publicly available information plus email correspondence with developers.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions.

Key Facts

  • FLI AI Safety Index Summer 2025 evaluated 7 companies across 6 dimensions using peer-reviewed methodology
  • Company scores: Anthropic C+ (2.64), OpenAI C (2.10), DeepMind C- (1.76), x.AI D (1.23), Meta D (1.06), Zhipu AI F (0.62), DeepSeek F (0.37)
  • Six evaluation dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
  • Methodology based on publicly available information plus email correspondence with developers