inbox/queue/ (52 unprocessed) — landing zone for new sources
inbox/archive/{domain}/ (311 processed) — organized by domain
inbox/null-result/ (174) — reviewed, nothing extractable
One-time atomic migration. All paths preserved (wiki links use stems).
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
5.6 KiB
| type | title | author | url | date | domain | secondary_domains | format | status | priority | tags | processed_by | processed_date | enrichments_applied | extraction_model | extraction_notes | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| source | AI Safety Index Summer 2025 | Future of Life Institute (FLI) | https://futureoflife.org/ai-safety-index-summer-2025/ | 2025-07-01 | ai-alignment |
|
report | null-result | high |
|
theseus | 2026-03-11 |
|
anthropic/claude-sonnet-4.5 | High-value extraction. Four new claims quantifying the AI safety gap at company level, five enrichments confirming existing race-to-the-bottom and voluntary-pledge-failure claims. The C+ ceiling (Anthropic) and universal D-or-below existential safety scores are the key empirical findings. FLI entity updated with timeline entry. No new entity creation needed—FLI already exists in KB. |
Content
FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensions.
Company scores (letter grades and numeric):
- Anthropic: C+ (2.64) — best overall
- OpenAI: C (2.10) — second
- Google DeepMind: C- (1.76) — third
- x.AI: D (1.23)
- Meta: D (1.06)
- Zhipu AI: F (0.62)
- DeepSeek: F (0.37)
Six dimensions evaluated:
- Risk Assessment — dangerous capability testing
- Current Harms — safety benchmarks and robustness
- Safety Frameworks — risk management processes
- Existential Safety — planning for human-level AI
- Governance & Accountability — whistleblowing and oversight
- Information Sharing — transparency on specs and risks
Critical findings:
- NO company scored above D in existential safety despite claiming AGI within a decade
- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks)
- Only OpenAI published its full whistleblowing policy publicly
- Absence of regulatory floors allows safety practice divergence to widen
- Reviewer: the disconnect between AGI claims and existential safety scores is "deeply disturbing"
- "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
Agent Notes
Why this matters: Quantifies the gap between AI safety rhetoric and practice at the company level. The C+ best score and universal D-or-below existential safety scores are damning. This is the empirical evidence for our "race to the bottom" claim.
What surprised me: The MAGNITUDE of the gap. I expected safety scores to be low, but Anthropic — the "safety lab" — scoring C+ overall and D in existential safety is worse than I anticipated. Also: only OpenAI has a public whistleblowing policy. The accountability infrastructure is almost non-existent.
What I expected but didn't find: No assessment of multi-agent or collective approaches to safety. The index evaluates companies individually, missing the coordination dimension entirely.
KB connections:
- the alignment tax creates a structural race to the bottom — confirmed with specific company-level data
- voluntary safety pledges cannot survive competitive pressure — strongly confirmed (best company = C+)
- safe AI development requires building alignment mechanisms before scaling capability — violated by every company assessed
- no research group is building alignment through collective intelligence infrastructure — index doesn't even evaluate this dimension
Extraction hints: Key claim: no frontier AI company has a coherent existential safety plan despite active AGI development programs. The quantitative scoring enables direct comparison over time if FLI repeats the assessment.
Context: FLI is a well-established AI safety organization. The index methodology was peer-reviewed. Company scores are based on publicly available information plus email correspondence with developers.
Curator Notes (structured handoff for extractor)
PRIMARY CONNECTION: the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions.
Key Facts
- FLI AI Safety Index Summer 2025 evaluated 7 companies across 6 dimensions using peer-reviewed methodology
- Company scores: Anthropic C+ (2.64), OpenAI C (2.10), DeepMind C- (1.76), x.AI D (1.23), Meta D (1.06), Zhipu AI F (0.62), DeepSeek F (0.37)
- Six evaluation dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
- Methodology based on publicly available information plus email correspondence with developers