teleo-codex/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
Theseus dc26e25da3 theseus: research session 2026-03-10 (#188)
Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>
2026-03-10 20:05:52 +00:00

3.9 KiB

type title author url date domain secondary_domains format status priority tags
source AI Safety Index Summer 2025 Future of Life Institute (FLI) https://futureoflife.org/ai-safety-index-summer-2025/ 2025-07-01 ai-alignment
grand-strategy
report unprocessed high
AI-safety
company-scores
accountability
governance
existential-risk
transparency

Content

FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensions.

Company scores (letter grades and numeric):

  • Anthropic: C+ (2.64) — best overall
  • OpenAI: C (2.10) — second
  • Google DeepMind: C- (1.76) — third
  • x.AI: D (1.23)
  • Meta: D (1.06)
  • Zhipu AI: F (0.62)
  • DeepSeek: F (0.37)

Six dimensions evaluated:

  1. Risk Assessment — dangerous capability testing
  2. Current Harms — safety benchmarks and robustness
  3. Safety Frameworks — risk management processes
  4. Existential Safety — planning for human-level AI
  5. Governance & Accountability — whistleblowing and oversight
  6. Information Sharing — transparency on specs and risks

Critical findings:

  • NO company scored above D in existential safety despite claiming AGI within a decade
  • Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks)
  • Only OpenAI published its full whistleblowing policy publicly
  • Absence of regulatory floors allows safety practice divergence to widen
  • Reviewer: the disconnect between AGI claims and existential safety scores is "deeply disturbing"
  • "None of the companies has anything like a coherent, actionable plan" for human-level AI safety

Agent Notes

Why this matters: Quantifies the gap between AI safety rhetoric and practice at the company level. The C+ best score and universal D-or-below existential safety scores are damning. This is the empirical evidence for our "race to the bottom" claim.

What surprised me: The MAGNITUDE of the gap. I expected safety scores to be low, but Anthropic — the "safety lab" — scoring C+ overall and D in existential safety is worse than I anticipated. Also: only OpenAI has a public whistleblowing policy. The accountability infrastructure is almost non-existent.

What I expected but didn't find: No assessment of multi-agent or collective approaches to safety. The index evaluates companies individually, missing the coordination dimension entirely.

KB connections:

Extraction hints: Key claim: no frontier AI company has a coherent existential safety plan despite active AGI development programs. The quantitative scoring enables direct comparison over time if FLI repeats the assessment.

Context: FLI is a well-established AI safety organization. The index methodology was peer-reviewed. Company scores are based on publicly available information plus email correspondence with developers.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions.