theseus: extract claims from 2025-07-00-fli-ai-safety-index-summer-2025.md

- Source: inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md - Domain: ai-alignment - Extracted by: headless extraction cron Pentagon-Agent: Theseus <HEADLESS>
2026-03-10 23:03:49 +00:00 · 2026-03-10 23:03:49 +00:00 · ffc3f8f210
commit ffc3f8f210
parent 71c29ca1e1
9 changed files with 227 additions and 1 deletions
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -21,6 +21,12 @@ The structural point is about threat proximity. AI takeover requires autonomy, r
 **Anthropic's own measurements confirm substantial uplift (mid-2025).** Dario Amodei reports that as of mid-2025, Anthropic's internal measurements show LLMs "doubling or tripling the likelihood of success" for bioweapon development across several relevant areas. Models are "likely now approaching the point where, without safeguards, they could be useful in enabling someone with a STEM degree but not specifically a biology degree to go through the whole process of producing a bioweapon." This is the end-to-end capability threshold — not just answering questions but providing interactive walk-through guidance spanning weeks or months, similar to tech support for complex procedures. Anthropic responded by elevating Claude Opus 4 and subsequent models to ASL-3 (AI Safety Level 3) protections. The gene synthesis supply chain is also failing: an MIT study found 36 out of 38 gene synthesis providers fulfilled orders containing the 1918 influenza sequence without flagging it. Amodei also raises the "mirror life" extinction scenario — left-handed biological organisms that would be indigestible to all existing life on Earth and could "proliferate in an uncontrollable way." A 2024 Stanford report assessed mirror life could "plausibly be created in the next one to few decades," and sufficiently powerful AI could accelerate this timeline dramatically. (Source: Dario Amodei, "The Adolescence of Technology," darioamodei.com, 2026.)
 ### Additional Evidence (extend)
 *Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
 FLI's Summer 2025 AI Safety Index shows that only 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive testing for bioterrorism facilitation capabilities. This means 4 of 7 companies—including Meta, x.AI, Zhipu AI, and DeepSeek—are deploying increasingly capable models without systematic evaluation of whether those models can facilitate bioweapon development. The absence of testing at 57% of frontier labs means the industry cannot know the current state of AI-enabled bioterrorism risk, let alone control it. This gap in testing is particularly concerning given that the companies NOT conducting bioterrorism testing include Meta (massive deployment scale) and DeepSeek (geopolitical actor with different safety incentives).
 ---
 Relevant Notes:
--- a/domains/ai-alignment/anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md
+++ b/domains/ai-alignment/anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md
@ -0,0 +1,44 @@
 ---
 type: claim
 domain: ai-alignment
 description: "The lab explicitly founded on AI safety principles achieves only C+ overall and D in existential safety in FLI's 2025 assessment, indicating structural rather than cultural barriers to safety investment"
 confidence: likely
 source: "Future of Life Institute AI Safety Index Summer 2025"
 created: 2025-07-01
 last_evaluated: 2025-07-01
 depends_on:
  - "no frontier AI company scores above D in existential safety despite active AGI development programs"
  - "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"
 challenged_by: []
 ---
 # Anthropic scores C+ overall and D in existential safety, indicating structural barriers to safety investment despite safety-first positioning
 Anthropic, founded explicitly as a safety-focused AI lab and consistently positioning itself as prioritizing alignment over capability racing, achieved the highest overall score (C+, 2.64/4.0) in FLI's Summer 2025 AI Safety Index—yet still scored only D in existential safety planning. This represents the ceiling of current industry safety practice, not an outlier, and suggests that competitive pressure constrains even explicitly safety-motivated organizations.
 ## Evidence
 **Anthropic's scores in FLI Summer 2025 assessment:**
 - Overall: C+ (2.64/4.0) — best among all evaluated companies
 - Existential Safety dimension: D — same as OpenAI and DeepMind
 - One of only 3 companies conducting substantive dangerous capability testing
 **Comparative context:**
 - Anthropic was founded by former OpenAI researchers specifically to prioritize safety
 - The company publicly emphasizes Constitutional AI and alignment research
 - Despite explicit safety focus, Anthropic scores only marginally better than OpenAI (C+ vs C)
 - The gap between Anthropic (C+, 2.64) and the lowest scorer DeepSeek (F, 0.37) is 2.27 points on a 4-point scale
 **Interpretation:**
 The fact that the *best* company scores C+ overall and D in existential safety indicates that competitive pressure constrains even explicitly safety-motivated organizations. If Anthropic—with safety as its founding mission—cannot achieve better than D in existential safety planning, this suggests structural rather than cultural barriers to safety investment.
 This is evidence that voluntary safety commitments face binding constraints: even when leadership genuinely prioritizes safety, market competition limits how much safety investment is viable. The "safety lab" achieves only marginally better scores than competitors, suggesting convergence toward a low-safety equilibrium rather than differentiation through superior safety practices.
 ---
 Relevant Notes:
 - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
 - [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
 Topics:
 - [[ai-alignment]]
--- a/domains/ai-alignment/no
+++ b/domains/ai-alignment/no
@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective
 The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.
 ### Additional Evidence (confirm)
 *Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
 FLI's AI Safety Index evaluates six dimensions of safety practice: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, and Information Sharing. Notably absent: any evaluation of multi-agent coordination, collective intelligence approaches, or coordination protocol development. The index evaluates companies individually and measures single-organization safety practices, missing the coordination dimension entirely. This confirms that even comprehensive safety assessments do not include collective intelligence infrastructure as a category, indicating the field has not recognized it as a viable alignment approach or has not prioritized it in evaluation frameworks.
 ---
 Relevant Notes:
--- a/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md
+++ b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md
@ -0,0 +1,54 @@
 ---
 type: claim
 domain: ai-alignment
 secondary_domains: [grand-strategy]
 description: "FLI's Summer 2025 index shows all frontier AI labs score D or below in existential safety planning while publicly claiming AGI timelines within a decade"
 confidence: likely
 source: "Future of Life Institute AI Safety Index Summer 2025"
 created: 2025-07-01
 last_evaluated: 2025-07-01
 depends_on:
  - "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"
  - "safe AI development requires building alignment mechanisms before scaling capability"
 challenged_by: []
 ---
 # No frontier AI company scores above D in existential safety despite active AGI development programs
 The Future of Life Institute's comprehensive evaluation of frontier AI companies (Summer 2025) reveals a systematic gap between AGI development claims and existential safety preparation. All evaluated companies—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—scored D or below in the "Existential Safety" dimension, despite most claiming AGI timelines within a decade.
 ## Evidence
 **Company scores across six safety dimensions:**
 - Anthropic: C+ overall (2.64/4.0), D in existential safety — highest scorer
 - OpenAI: C overall (2.10/4.0), D in existential safety
 - Google DeepMind: C- overall (1.76/4.0), D in existential safety
 - x.AI: D overall (1.23/4.0)
 - Meta: D overall (1.06/4.0)
 - Zhipu AI: F overall (0.62/4.0)
 - DeepSeek: F overall (0.37/4.0)
 **Key findings:**
 - Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks)
 - Only OpenAI published its full whistleblowing policy publicly
 - FLI assessment: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety
 - The disconnect between AGI claims and existential safety scores is "deeply disturbing" per FLI reviewers
 **Methodology:** Peer-reviewed index based on publicly available information plus email correspondence with developers. Six dimensions evaluated: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing.
 The universal D-or-below existential safety scores indicate that even the most safety-conscious labs (Anthropic positions itself as a "safety lab") lack actionable plans for the very capability level they claim to be pursuing. This is quantitative evidence of the structural race-to-the-bottom dynamic: competitive pressure prevents any company from investing sufficiently in existential safety planning, even when leadership publicly acknowledges the risk.
 ## Challenges
 None identified. The index methodology was peer-reviewed and scores are based on verifiable public information.
 ---
 Relevant Notes:
 - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
 - [[safe AI development requires building alignment mechanisms before scaling capability]]
 - [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]]
 Topics:
 - [[ai-alignment]]
 - [[grand-strategy]]
--- a/domains/ai-alignment/only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md
+++ b/domains/ai-alignment/only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md
@ -0,0 +1,44 @@
 ---
 type: claim
 domain: ai-alignment
 secondary_domains: [grand-strategy]
 description: "FLI's 2025 index shows OpenAI is the only frontier AI company with a publicly available whistleblowing policy, indicating near-zero accountability infrastructure across the industry"
 confidence: likely
 source: "Future of Life Institute AI Safety Index Summer 2025"
 created: 2025-07-01
 last_evaluated: 2025-07-01
 depends_on: []
 challenged_by: []
 ---
 # Only OpenAI published its full whistleblowing policy publicly among frontier AI companies
 According to FLI's Summer 2025 AI Safety Index, OpenAI is the only frontier AI company that has published its complete whistleblowing policy publicly. Among seven evaluated companies—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—this represents a near-total absence of public accountability infrastructure for internal safety concerns.
 ## Evidence
 **From FLI AI Safety Index Summer 2025:**
 - Dimension evaluated: "Governance & Accountability — whistleblowing and oversight"
 - 7 companies assessed
 - Only OpenAI has published full whistleblowing policy publicly
 - 6 companies (86%) have no public whistleblowing mechanism
 **Why this matters:**
 Whistleblowing policies are basic governance infrastructure for organizations developing potentially catastrophic technology. The fact that only 1 of 7 frontier labs has made such a policy public indicates that internal accountability mechanisms are either absent or deliberately opaque.
 This is particularly concerning given:
 1. The power asymmetry between individual employees and well-resourced AI companies
 2. The potential for employees to observe safety violations or capability developments that leadership conceals
 3. The public interest in knowing whether frontier AI development includes channels for safety concerns
 The absence of public whistleblowing policies means that employees who observe dangerous practices have no clear, protected path to raise concerns externally. This concentrates information about safety practices within companies and prevents external oversight—a critical gap given that frontier AI development involves existential risks that affect all of humanity.
 ---
 Relevant Notes:
 - [[no frontier AI company scores above D in existential safety despite active AGI development programs]]
 - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
 Topics:
 - [[ai-alignment]]
 - [[grand-strategy]]
--- a/domains/ai-alignment/only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md
+++ b/domains/ai-alignment/only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md
@ -0,0 +1,41 @@
 ---
 type: claim
 domain: ai-alignment
 secondary_domains: [grand-strategy]
 description: "FLI's 2025 index shows only Anthropic, OpenAI, and DeepMind test for bioterrorism and cyberattack capabilities while all companies claim responsible development"
 confidence: likely
 source: "Future of Life Institute AI Safety Index Summer 2025"
 created: 2025-07-01
 last_evaluated: 2025-07-01
 depends_on: []
 challenged_by: []
 ---
 # Only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development
 Of the seven frontier AI companies evaluated in FLI's Summer 2025 AI Safety Index, only Anthropic, OpenAI, and Google DeepMind conduct substantive testing for dangerous capabilities such as bioterrorism facilitation and cyberattack automation. This represents less than half of evaluated companies, despite all companies publicly claiming commitment to responsible AI development.
 ## Evidence
 **From FLI AI Safety Index Summer 2025:**
 - 7 companies evaluated: Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, DeepSeek
 - Only 3 conduct substantive dangerous capability testing: Anthropic, OpenAI, DeepMind (43% of sample)
 - 4 companies lack substantive testing: x.AI, Meta, Zhipu AI, DeepSeek (57% of sample)
 - Testing categories: bioterrorism facilitation, cyberattack capabilities
 - Dimension evaluated: "Risk Assessment — dangerous capability testing"
 - All companies publicly claim responsible development practices
 **Implications:**
 The gap between rhetoric and practice is stark: companies that do not test for dangerous capabilities cannot know whether their models possess them. This creates a scenario where 4 of 7 frontier labs are deploying increasingly capable models without systematic evaluation of catastrophic risk vectors.
 The concentration of testing in the three largest, most-resourced labs (Anthropic, OpenAI, DeepMind) suggests that dangerous capability evaluation requires infrastructure investment that smaller or less safety-focused competitors skip. This is consistent with the alignment tax hypothesis: safety practices that impose costs are adopted only by well-resourced organizations with explicit safety mandates.
 ---
 Relevant Notes:
 - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
 - [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]]
 Topics:
 - [[ai-alignment]]
 - [[grand-strategy]]
--- a/domains/ai-alignment/safe
+++ b/domains/ai-alignment/safe
@ -21,6 +21,12 @@ This phased approach is also a practical response to the observation that since
 Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions.
 ### Additional Evidence (confirm)
 *Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
 FLI's Summer 2025 AI Safety Index shows all frontier AI companies violate this principle: every evaluated company scores D or below in existential safety while actively developing toward AGI. The index explicitly notes that 'none of the companies has anything like a coherent, actionable plan' for human-level AI safety, yet all continue capability scaling. This is systematic evidence across seven companies (Anthropic, OpenAI, DeepMind, x.AI, Meta, Zhipu AI, DeepSeek) that the industry is scaling capability without corresponding alignment infrastructure, exactly the failure mode this claim warns against.
 ---
 Relevant Notes:
--- a/domains/ai-alignment/voluntary
+++ b/domains/ai-alignment/voluntary
@ -27,6 +27,12 @@ The timing is revealing: Anthropic dropped its safety pledge the same week the P
 Anthropic, widely considered the most safety-focused frontier AI lab, rolled back its Responsible Scaling Policy (RSP) in February 2026. The original 2023 RSP committed to never training an AI system unless the company could guarantee in advance that safety measures were adequate. The new RSP explicitly acknowledges the structural dynamic: safety work 'requires collaboration (and in some cases sacrifices) from multiple parts of the company and can be at cross-purposes with immediate competitive and commercial priorities.' This represents the highest-profile case of a voluntary AI safety commitment collapsing under competitive pressure. Anthropic's own language confirms the mechanism: safety is a competitive cost ('sacrifices') that conflicts with commercial imperatives ('at cross-purposes'). Notably, no alternative coordination mechanism was proposed—they weakened the commitment without proposing what would make it sustainable (industry-wide agreements, regulatory requirements, market mechanisms). This is particularly significant because Anthropic is the organization most publicly committed to safety governance, making their rollback empirical validation that even safety-prioritizing institutions cannot sustain unilateral commitments under competitive pressure.
 ### Additional Evidence (confirm)
 *Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
 FLI's Summer 2025 AI Safety Index provides quantitative company-level evidence: the highest-scoring company (Anthropic, explicitly founded as a safety lab) achieves only C+ overall (2.64/4.0) and D in existential safety. All seven evaluated companies score D or below in existential safety despite public AGI development claims. The best company scores 2.64/4.0 while the worst scores 0.37/4.0—a 2.27-point spread indicating that competitive pressure creates convergence toward low safety investment even among companies with explicit safety mandates. Only 3 of 7 companies conduct dangerous capability testing; only 1 of 7 has published a whistleblowing policy. This is direct empirical confirmation that voluntary commitments cannot survive competitive dynamics: even the most safety-motivated organization (Anthropic) cannot differentiate itself through superior safety practices, suggesting the market punishes safety investment.
 ---
 Relevant Notes:
--- a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
+++ b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md
@ -7,9 +7,15 @@ date: 2025-07-01
 domain: ai-alignment
 secondary_domains: [grand-strategy]
 format: report
-status: unprocessed
+status: processed
 priority: high
 tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency]
 processed_by: theseus
 processed_date: 2025-07-01
 claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md", "anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md", "only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md", "only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md"]
 enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
 extraction_notes: "Extracted 4 new claims and 4 enrichments. Primary claim is the universal D-or-below existential safety scores despite AGI development programs. Secondary claims cover Anthropic's ceiling performance, dangerous capability testing gaps, and whistleblowing policy absence. All claims directly support the race-to-the-bottom thesis with quantitative company-level data. The index provides the first comprehensive comparative safety assessment across frontier labs, making it high-value evidence for multiple existing alignment claims."
 ---
 ## Content
@ -62,3 +68,16 @@ FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensio
 PRIMARY CONNECTION: [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]
 WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety
 EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions.
 ## Key Facts
 - Anthropic overall score: C+ (2.64/4.0)
 - OpenAI overall score: C (2.10/4.0)
 - Google DeepMind overall score: C- (1.76/4.0)
 - x.AI overall score: D (1.23/4.0)
 - Meta overall score: D (1.06/4.0)
 - Zhipu AI overall score: F (0.62/4.0)
 - DeepSeek overall score: F (0.37/4.0)
 - All companies score D or below in Existential Safety dimension
 - FLI index evaluates 6 dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing
 - Index methodology: peer-reviewed, based on public information plus email correspondence with developers