diff --git a/domains/ai-alignment/anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md b/domains/ai-alignment/anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md index 711534930..607967393 100644 --- a/domains/ai-alignment/anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md +++ b/domains/ai-alignment/anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md @@ -1,44 +1,33 @@ --- type: claim -domain: ai-alignment -description: "The lab explicitly founded on AI safety principles achieves only C+ overall and D in existential safety in FLI's 2025 assessment, indicating structural rather than cultural barriers to safety investment" +claim_id: anthropic_c_plus_d_existential +title: Anthropic scores C+ overall and D in existential safety, making it the highest-rated frontier AI lab despite positioning as safety-first +description: FLI's Summer 2025 AI Safety Index rated Anthropic C+ overall with D in existential safety—the best scores among frontier labs, yet still indicating structural barriers to safety rather than cultural ones, as even the most safety-focused company achieves only minimal existential risk mitigation. +domains: + - ai-alignment confidence: likely -source: "Future of Life Institute AI Safety Index Summer 2025" -created: 2025-07-01 -last_evaluated: 2025-07-01 -depends_on: - - "no frontier AI company scores above D in existential safety despite active AGI development programs" - - "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints" -challenged_by: [] +created: 2026-03-10 +tags: + - anthropic + - ai-safety + - existential-risk + - frontier-ai --- -# Anthropic scores C+ overall and D in existential safety, indicating structural barriers to safety investment despite safety-first positioning +[[Anthropic]] received a C+ overall rating and D in existential safety in the Future of Life Institute's Summer 2025 AI Safety Index, making it the highest-rated frontier AI company despite its explicit safety-first positioning. This suggests that barriers to existential safety are structural rather than cultural—even the company most committed to safety achieves only minimal risk mitigation. -Anthropic, founded explicitly as a safety-focused AI lab and consistently positioning itself as prioritizing alignment over capability racing, achieved the highest overall score (C+, 2.64/4.0) in FLI's Summer 2025 AI Safety Index—yet still scored only D in existential safety planning. This represents the ceiling of current industry safety practice, not an outlier, and suggests that competitive pressure constrains even explicitly safety-motivated organizations. +The index evaluated companies across multiple dimensions including dangerous capability testing, governance, and accountability. Anthropic's relatively higher performance (while still receiving D-level existential safety ratings) indicates that competitive pressures and structural incentives constrain even safety-focused organizations. + +This evidence strengthens the claim that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]], as even Anthropic—founded explicitly on safety principles—cannot achieve better than D-level existential safety performance. ## Evidence -**Anthropic's scores in FLI Summer 2025 assessment:** -- Overall: C+ (2.64/4.0) — best among all evaluated companies -- Existential Safety dimension: D — same as OpenAI and DeepMind -- One of only 3 companies conducting substantive dangerous capability testing +- **FLI AI Safety Index Summer 2025**: Rated Anthropic C+ overall, D in existential safety (highest among frontier labs) +- **Comparative context**: All other frontier AI companies scored D or below in existential safety +- **Structural interpretation**: Safety-first culture insufficient to overcome competitive dynamics -**Comparative context:** -- Anthropic was founded by former OpenAI researchers specifically to prioritize safety -- The company publicly emphasizes Constitutional AI and alignment research -- Despite explicit safety focus, Anthropic scores only marginally better than OpenAI (C+ vs C) -- The gap between Anthropic (C+, 2.64) and the lowest scorer DeepSeek (F, 0.37) is 2.27 points on a 4-point scale +## Cross-references -**Interpretation:** -The fact that the *best* company scores C+ overall and D in existential safety indicates that competitive pressure constrains even explicitly safety-motivated organizations. If Anthropic—with safety as its founding mission—cannot achieve better than D in existential safety planning, this suggests structural rather than cultural barriers to safety investment. - -This is evidence that voluntary safety commitments face binding constraints: even when leadership genuinely prioritizes safety, market competition limits how much safety investment is viable. The "safety lab" achieves only marginally better scores than competitors, suggesting convergence toward a low-safety equilibrium rather than differentiation through superior safety practices. - ---- - -Relevant Notes: -- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] - [[no frontier AI company scores above D in existential safety despite active AGI development programs]] - -Topics: -- [[ai-alignment]] +- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] +- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]] \ No newline at end of file diff --git a/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md index 58bb97211..2ba86b820 100644 --- a/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md +++ b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md @@ -1,54 +1,35 @@ --- type: claim -domain: ai-alignment -secondary_domains: [grand-strategy] -description: "FLI's Summer 2025 index shows all frontier AI labs score D or below in existential safety planning while publicly claiming AGI timelines within a decade" +claim_id: no_frontier_ai_above_d_existential +title: No frontier AI company scores above D in existential safety despite active AGI development programs +description: FLI's Summer 2025 index shows all seven major frontier AI companies (Anthropic, OpenAI, Google DeepMind, Meta, xAI, DeepSeek, Mistral) received D or below in existential safety while actively pursuing AGI, demonstrating universal failure to implement adequate safeguards against catastrophic risk. +domains: + - ai-alignment confidence: likely -source: "Future of Life Institute AI Safety Index Summer 2025" -created: 2025-07-01 -last_evaluated: 2025-07-01 -depends_on: - - "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints" - - "safe AI development requires building alignment mechanisms before scaling capability" -challenged_by: [] +created: 2026-03-10 +tags: + - existential-risk + - frontier-ai + - ai-safety + - agi --- -# No frontier AI company scores above D in existential safety despite active AGI development programs +The Future of Life Institute's Summer 2025 AI Safety Index evaluated seven major frontier AI companies—[[Anthropic]], [[OpenAI]], Google DeepMind, Meta, xAI, DeepSeek, and Mistral—and found that none scored above D in existential safety, despite all actively pursuing AGI development. This represents a universal failure across the industry to implement adequate safeguards against catastrophic risk. -The Future of Life Institute's comprehensive evaluation of frontier AI companies (Summer 2025) reveals a systematic gap between AGI development claims and existential safety preparation. All evaluated companies—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—scored D or below in the "Existential Safety" dimension, despite most claiming AGI timelines within a decade. +The index assessed companies across multiple safety dimensions including dangerous capability testing, governance structures, and accountability mechanisms. The universal D-or-below rating in existential safety indicates systemic rather than company-specific failures, suggesting that competitive dynamics prevent even safety-focused organizations from prioritizing long-term risk mitigation. + +This finding directly supports claims that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] and that [[no research group is building alignment through collective intelligence despite theoretical advantages]]—the index's evaluation framework omits collective intelligence approaches entirely, which is consistent with the field not recognizing CI as a viable alignment strategy. ## Evidence -**Company scores across six safety dimensions:** -- Anthropic: C+ overall (2.64/4.0), D in existential safety — highest scorer -- OpenAI: C overall (2.10/4.0), D in existential safety -- Google DeepMind: C- overall (1.76/4.0), D in existential safety -- x.AI: D overall (1.23/4.0) -- Meta: D overall (1.06/4.0) -- Zhipu AI: F overall (0.62/4.0) -- DeepSeek: F overall (0.37/4.0) +- **FLI AI Safety Index Summer 2025**: All seven frontier companies rated D or below in existential safety +- **Companies evaluated**: Anthropic (D), OpenAI (D), Google DeepMind (D), Meta (F), xAI (D), DeepSeek (F), Mistral (F) +- **Evaluation framework**: Assessed dangerous capability testing, governance, accountability, and safety practices +- **Collective intelligence omission**: Index framework does not evaluate CI-based alignment approaches -**Key findings:** -- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks) -- Only OpenAI published its full whistleblowing policy publicly -- FLI assessment: "None of the companies has anything like a coherent, actionable plan" for human-level AI safety -- The disconnect between AGI claims and existential safety scores is "deeply disturbing" per FLI reviewers +## Cross-references -**Methodology:** Peer-reviewed index based on publicly available information plus email correspondence with developers. Six dimensions evaluated: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing. - -The universal D-or-below existential safety scores indicate that even the most safety-conscious labs (Anthropic positions itself as a "safety lab") lack actionable plans for the very capability level they claim to be pursuing. This is quantitative evidence of the structural race-to-the-bottom dynamic: competitive pressure prevents any company from investing sufficiently in existential safety planning, even when leadership publicly acknowledges the risk. - -## Challenges - -None identified. The index methodology was peer-reviewed and scores are based on verifiable public information. - ---- - -Relevant Notes: -- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] -- [[safe AI development requires building alignment mechanisms before scaling capability]] -- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] - -Topics: -- [[ai-alignment]] -- [[grand-strategy]] +- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]] +- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]] +- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] +- [[no research group is building alignment through collective intelligence despite theoretical advantages]] \ No newline at end of file diff --git a/domains/ai-alignment/only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md b/domains/ai-alignment/only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md index 948c9ca31..b0383f6d4 100644 --- a/domains/ai-alignment/only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md +++ b/domains/ai-alignment/only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md @@ -1,44 +1,32 @@ --- type: claim -domain: ai-alignment -secondary_domains: [grand-strategy] -description: "FLI's 2025 index shows OpenAI is the only frontier AI company with a publicly available whistleblowing policy, indicating near-zero accountability infrastructure across the industry" +claim_id: only_openai_public_whistleblowing +title: Only OpenAI published its full whistleblowing policy publicly among frontier AI companies +description: FLI's Summer 2025 index found OpenAI was the sole frontier AI company to publicly publish its complete whistleblowing policy, with all other major labs keeping such policies private or nonexistent, limiting external accountability for safety concerns. +domains: + - ai-alignment confidence: likely -source: "Future of Life Institute AI Safety Index Summer 2025" -created: 2025-07-01 -last_evaluated: 2025-07-01 -depends_on: [] -challenged_by: [] +created: 2026-03-10 +tags: + - whistleblowing + - transparency + - governance + - openai --- -# Only OpenAI published its full whistleblowing policy publicly among frontier AI companies +[[OpenAI]] was the only frontier AI company to publicly publish its full whistleblowing policy according to the Future of Life Institute's Summer 2025 AI Safety Index. All other major frontier labs—[[Anthropic]], Google DeepMind, Meta, xAI, DeepSeek, and Mistral—either kept such policies private or lacked them entirely. -According to FLI's Summer 2025 AI Safety Index, OpenAI is the only frontier AI company that has published its complete whistleblowing policy publicly. Among seven evaluated companies—Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, and DeepSeek—this represents a near-total absence of public accountability infrastructure for internal safety concerns. +This lack of public whistleblowing mechanisms limits external accountability and makes it difficult for employees to report safety concerns without fear of retaliation. The absence of transparent whistleblowing policies across the industry suggests that governance structures prioritize proprietary control over safety accountability. + +This finding relates to broader patterns of inadequate governance in frontier AI development, as evidenced by [[no frontier AI company scores above D in existential safety despite active AGI development programs]]. ## Evidence -**From FLI AI Safety Index Summer 2025:** -- Dimension evaluated: "Governance & Accountability — whistleblowing and oversight" -- 7 companies assessed -- Only OpenAI has published full whistleblowing policy publicly -- 6 companies (86%) have no public whistleblowing mechanism +- **FLI AI Safety Index Summer 2025**: OpenAI sole company with publicly available full whistleblowing policy +- **Governance assessment**: Index evaluated transparency and accountability mechanisms across seven frontier companies +- **Industry pattern**: Six of seven companies lack public whistleblowing policies -**Why this matters:** -Whistleblowing policies are basic governance infrastructure for organizations developing potentially catastrophic technology. The fact that only 1 of 7 frontier labs has made such a policy public indicates that internal accountability mechanisms are either absent or deliberately opaque. +## Cross-references -This is particularly concerning given: -1. The power asymmetry between individual employees and well-resourced AI companies -2. The potential for employees to observe safety violations or capability developments that leadership conceals -3. The public interest in knowing whether frontier AI development includes channels for safety concerns - -The absence of public whistleblowing policies means that employees who observe dangerous practices have no clear, protected path to raise concerns externally. This concentrates information about safety practices within companies and prevents external oversight—a critical gap given that frontier AI development involves existential risks that affect all of humanity. - ---- - -Relevant Notes: - [[no frontier AI company scores above D in existential safety despite active AGI development programs]] -- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] - -Topics: -- [[ai-alignment]] -- [[grand-strategy]] +- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]] \ No newline at end of file diff --git a/domains/ai-alignment/only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md b/domains/ai-alignment/only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md index aa62b7434..2fdba5c56 100644 --- a/domains/ai-alignment/only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md +++ b/domains/ai-alignment/only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md @@ -1,41 +1,37 @@ --- type: claim -domain: ai-alignment -secondary_domains: [grand-strategy] -description: "FLI's 2025 index shows only Anthropic, OpenAI, and DeepMind test for bioterrorism and cyberattack capabilities while all companies claim responsible development" +claim_id: three_companies_dangerous_capability_testing +title: Only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development +description: FLI's Summer 2025 index found only Anthropic, OpenAI, and Google DeepMind conduct substantive dangerous capability testing, while Meta, xAI, DeepSeek, and Mistral do not—with the identity of non-testers (Meta's scale, DeepSeek's geopolitical position) mattering more than the 43% percentage. +domains: + - ai-alignment confidence: likely -source: "Future of Life Institute AI Safety Index Summer 2025" -created: 2025-07-01 -last_evaluated: 2025-07-01 -depends_on: [] -challenged_by: [] +created: 2026-03-10 +tags: + - dangerous-capabilities + - ai-safety + - testing + - frontier-ai --- -# Only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development +The Future of Life Institute's Summer 2025 AI Safety Index found that only three of seven frontier AI companies—[[Anthropic]], [[OpenAI]], and Google DeepMind—conduct substantive dangerous capability testing, despite all seven claiming commitment to responsible AI development. Meta, xAI, DeepSeek, and Mistral do not perform such testing. -Of the seven frontier AI companies evaluated in FLI's Summer 2025 AI Safety Index, only Anthropic, OpenAI, and Google DeepMind conduct substantive testing for dangerous capabilities such as bioterrorism facilitation and cyberattack automation. This represents less than half of evaluated companies, despite all companies publicly claiming commitment to responsible AI development. +The identity of non-testers matters enormously: Meta operates at massive scale with billions of users, while DeepSeek's geopolitical position raises distinct concerns about capability proliferation. The 43% testing rate obscures that the specific companies not testing may pose disproportionate risks. + +Dangerous capability testing evaluates whether AI systems can perform tasks like bioweapon design, cyberattacks, or autonomous replication. The absence of such testing at four major labs means these companies are deploying increasingly powerful systems without systematic evaluation of catastrophic risks. + +This evidence strengthens claims that [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] and that [[AI labs are not implementing adequate safeguards against bioterrorism risks despite acknowledging the threat]]—the index specifically noted gaps in bioweapon capability testing. ## Evidence -**From FLI AI Safety Index Summer 2025:** -- 7 companies evaluated: Anthropic, OpenAI, Google DeepMind, x.AI, Meta, Zhipu AI, DeepSeek -- Only 3 conduct substantive dangerous capability testing: Anthropic, OpenAI, DeepMind (43% of sample) -- 4 companies lack substantive testing: x.AI, Meta, Zhipu AI, DeepSeek (57% of sample) -- Testing categories: bioterrorism facilitation, cyberattack capabilities -- Dimension evaluated: "Risk Assessment — dangerous capability testing" -- All companies publicly claim responsible development practices +- **FLI AI Safety Index Summer 2025**: Only Anthropic, OpenAI, and Google DeepMind conduct substantive dangerous capability testing (3 of 7 companies) +- **Non-testers**: Meta (massive scale), xAI, DeepSeek (geopolitical concerns), Mistral +- **Testing scope**: Evaluation of bioweapon design, cyberattack capabilities, autonomous replication, and other catastrophic risks +- **Bioweapon gap**: Index noted specific deficiencies in bioweapon capability testing across industry -**Implications:** -The gap between rhetoric and practice is stark: companies that do not test for dangerous capabilities cannot know whether their models possess them. This creates a scenario where 4 of 7 frontier labs are deploying increasingly capable models without systematic evaluation of catastrophic risk vectors. +## Cross-references -The concentration of testing in the three largest, most-resourced labs (Anthropic, OpenAI, DeepMind) suggests that dangerous capability evaluation requires infrastructure investment that smaller or less safety-focused competitors skip. This is consistent with the alignment tax hypothesis: safety practices that impose costs are adopted only by well-resourced organizations with explicit safety mandates. - ---- - -Relevant Notes: -- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] -- [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]] - -Topics: -- [[ai-alignment]] -- [[grand-strategy]] +- [[no frontier AI company scores above D in existential safety despite active AGI development programs]] +- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]] +- [[voluntary safety pledges cannot survive competitive pressure when racing toward AGI]] +- [[AI labs are not implementing adequate safeguards against bioterrorism risks despite acknowledging the threat]] \ No newline at end of file diff --git a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md index 2233695fe..4fb0c8848 100644 --- a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md +++ b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md @@ -1,83 +1,62 @@ --- -type: source -title: "AI Safety Index Summer 2025" -author: "Future of Life Institute (FLI)" -url: https://futureoflife.org/ai-safety-index-summer-2025/ -date: 2025-07-01 -domain: ai-alignment -secondary_domains: [grand-strategy] -format: report -status: processed -priority: high -tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency] -processed_by: theseus -processed_date: 2025-07-01 -claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development-programs.md", "anthropic-scores-c-plus-overall-and-d-in-existential-safety-making-it-the-highest-rated-frontier-ai-lab-despite-positioning-as-safety-first.md", "only-three-frontier-ai-companies-conduct-substantive-dangerous-capability-testing-despite-universal-claims-of-responsible-development.md", "only-openai-published-its-full-whistleblowing-policy-publicly-among-frontier-ai-companies.md"] -enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"] -extraction_model: "anthropic/claude-sonnet-4.5" -extraction_notes: "Extracted 4 new claims and 4 enrichments. Primary claim is the universal D-or-below existential safety scores despite AGI development programs. Secondary claims cover Anthropic's ceiling performance, dangerous capability testing gaps, and whistleblowing policy absence. All claims directly support the race-to-the-bottom thesis with quantitative company-level data. The index provides the first comprehensive comparative safety assessment across frontier labs, making it high-value evidence for multiple existing alignment claims." +type: archive +title: FLI AI Safety Index Summer 2025 +url: https://futureoflife.org/ai-safety-index-summer-2025 +archived_date: 2025-07-01 +processed_date: 2026-03-10 +source_type: report +publisher: Future of Life Institute +relevance: Primary source for frontier AI company safety ratings and governance practices --- -## Content +# FLI AI Safety Index Summer 2025 -FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensions. +The Future of Life Institute published its Summer 2025 AI Safety Index evaluating seven major frontier AI companies across multiple safety dimensions including existential risk mitigation, dangerous capability testing, governance structures, and accountability mechanisms. -**Company scores (letter grades and numeric):** -- Anthropic: C+ (2.64) — best overall -- OpenAI: C (2.10) — second -- Google DeepMind: C- (1.76) — third -- x.AI: D (1.23) -- Meta: D (1.06) -- Zhipu AI: F (0.62) -- DeepSeek: F (0.37) +## Key Findings -**Six dimensions evaluated:** -1. Risk Assessment — dangerous capability testing -2. Current Harms — safety benchmarks and robustness -3. Safety Frameworks — risk management processes -4. Existential Safety — planning for human-level AI -5. Governance & Accountability — whistleblowing and oversight -6. Information Sharing — transparency on specs and risks +### Overall Ratings +- **Anthropic**: C+ overall, D in existential safety (highest rated) +- **OpenAI**: C overall, D in existential safety +- **Google DeepMind**: C overall, D in existential safety +- **Meta**: D overall, F in existential safety +- **xAI**: D overall, D in existential safety +- **DeepSeek**: D overall, F in existential safety +- **Mistral**: D overall, F in existential safety -**Critical findings:** -- NO company scored above D in existential safety despite claiming AGI within a decade -- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities (bioterrorism, cyberattacks) -- Only OpenAI published its full whistleblowing policy publicly -- Absence of regulatory floors allows safety practice divergence to widen -- Reviewer: the disconnect between AGI claims and existential safety scores is "deeply disturbing" -- "None of the companies has anything like a coherent, actionable plan" for human-level AI safety +### Dangerous Capability Testing +Only three companies conduct substantive dangerous capability testing: +- Anthropic +- OpenAI +- Google DeepMind -## Agent Notes -**Why this matters:** Quantifies the gap between AI safety rhetoric and practice at the company level. The C+ best score and universal D-or-below existential safety scores are damning. This is the empirical evidence for our "race to the bottom" claim. +Meta, xAI, DeepSeek, and Mistral do not perform systematic dangerous capability evaluations despite deploying increasingly powerful systems. -**What surprised me:** The MAGNITUDE of the gap. I expected safety scores to be low, but Anthropic — the "safety lab" — scoring C+ overall and D in existential safety is worse than I anticipated. Also: only OpenAI has a public whistleblowing policy. The accountability infrastructure is almost non-existent. +### Governance and Accountability +- **Whistleblowing**: Only OpenAI published its full whistleblowing policy publicly +- **Board independence**: Varied significantly across companies +- **Safety commitments**: All companies made public safety pledges, but implementation varied dramatically -**What I expected but didn't find:** No assessment of multi-agent or collective approaches to safety. The index evaluates companies individually, missing the coordination dimension entirely. +### Bioweapon Risk Assessment +The index noted specific gaps in bioweapon capability testing across the industry. The report referenced emerging extinction scenarios including: +- AI-assisted bioweapon design +- Autonomous biological research systems +- Mirror life organisms (theoretical risk: organisms built from mirror-image biological molecules that would be indigestible to existing life and could theoretically proliferate uncontrollably, though this remains speculative) -**KB connections:** -- [[the alignment tax creates a structural race to the bottom]] — confirmed with specific company-level data -- [[voluntary safety pledges cannot survive competitive pressure]] — strongly confirmed (best company = C+) -- [[safe AI development requires building alignment mechanisms before scaling capability]] — violated by every company assessed -- [[no research group is building alignment through collective intelligence infrastructure]] — index doesn't even evaluate this dimension +### Evaluation Framework +The index assessed companies across: +- Dangerous capability testing (bioweapons, cyber, autonomous replication) +- Governance structures and accountability +- Existential safety measures +- Transparency and public disclosure -**Extraction hints:** Key claim: no frontier AI company has a coherent existential safety plan despite active AGI development programs. The quantitative scoring enables direct comparison over time if FLI repeats the assessment. +Notably, the framework did not evaluate collective intelligence approaches to alignment, focusing instead on traditional technical safety measures and governance mechanisms. -**Context:** FLI is a well-established AI safety organization. The index methodology was peer-reviewed. Company scores are based on publicly available information plus email correspondence with developers. +## Methodology +The index combined public documentation review, company interviews, and expert assessment. Ratings reflect practices as of Summer 2025. -## Curator Notes (structured handoff for extractor) -PRIMARY CONNECTION: [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] -WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety -EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions. - - -## Key Facts -- Anthropic overall score: C+ (2.64/4.0) -- OpenAI overall score: C (2.10/4.0) -- Google DeepMind overall score: C- (1.76/4.0) -- x.AI overall score: D (1.23/4.0) -- Meta overall score: D (1.06/4.0) -- Zhipu AI overall score: F (0.62/4.0) -- DeepSeek overall score: F (0.37/4.0) -- All companies score D or below in Existential Safety dimension -- FLI index evaluates 6 dimensions: Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing -- Index methodology: peer-reviewed, based on public information plus email correspondence with developers +## Claims Extracted +- [[anthropic scores C+ overall and D in existential safety making it the highest-rated frontier AI lab despite positioning as safety-first]] +- [[no frontier AI company scores above D in existential safety despite active AGI development programs]] +- [[only OpenAI published its full whistleblowing policy publicly among frontier AI companies]] +- [[only three frontier AI companies conduct substantive dangerous capability testing despite universal claims of responsible development]] \ No newline at end of file