From 5113e174b5bd958d598214c0f699e14baf3db401 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Thu, 12 Mar 2026 05:58:13 +0000 Subject: [PATCH] theseus: extract from 2025-07-00-fli-ai-safety-index-summer-2025.md - Source: inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 6) Pentagon-Agent: Theseus --- ...t proximate AI-enabled existential risk.md | 6 +++ ...-ai-safety-practice-divergence-to-widen.md | 38 +++++++++++++++++++ ... converging on problems that require it.md | 6 +++ ...l-safety-despite-active-agi-development.md | 37 ++++++++++++++++++ ...nt mechanisms before scaling capability.md | 6 +++ ...5-07-00-fli-ai-safety-index-summer-2025.md | 21 +++++++++- 6 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md create mode 100644 domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md diff --git a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md index db07420d..268cf711 100644 --- a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md +++ b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md @@ -27,6 +27,12 @@ The structural point is about threat proximity. AI takeover requires autonomy, r The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that 'biological/chemical weapons information accessible through AI systems' is a documented malicious use risk. While the report does not specify the expertise level required (PhD vs amateur), it categorizes bio/chem weapons information access alongside AI-generated persuasion and cyberattack capabilities as confirmed malicious use risks, giving institutional multi-government validation to the bioterrorism concern. + +### Additional Evidence (confirm) +*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +FLI's index includes 'dangerous capability testing' as a core evaluation dimension, specifically measuring whether companies test for bioterrorism and cyberattack capabilities. Only 3 of 7 frontier AI companies (Anthropic, OpenAI, DeepMind) conduct substantive testing for these dangerous capabilities, indicating that bioterrorism risk is recognized as a priority concern by safety evaluators and that most companies are not adequately testing for it. + --- Relevant Notes: diff --git a/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md b/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md new file mode 100644 index 00000000..c89fe816 --- /dev/null +++ b/domains/ai-alignment/absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md @@ -0,0 +1,38 @@ +--- +type: claim +domain: ai-alignment +description: "FLI's index shows safety scores ranging from F (0.37) to C+ (2.64) demonstrating that voluntary approaches produce widening divergence not convergence" +confidence: likely +source: "Future of Life Institute, AI Safety Index Summer 2025 (2025-07-01)" +created: 2026-03-11 +secondary_domains: [grand-strategy] +--- + +# Absence of regulatory floors allows AI safety practice divergence to widen + +The FLI AI Safety Index Summer 2025 reveals massive divergence in safety practices across frontier AI companies, with scores ranging from F (0.37 for DeepSeek) to C+ (2.64 for Anthropic) — a 7x difference in overall safety scores. This divergence demonstrates that voluntary safety commitments produce widening gaps rather than industry convergence toward best practices. + +Key divergence patterns: +- Only 3 of 7 companies (Anthropic, OpenAI, DeepMind) conduct substantive dangerous capability testing +- Only 1 company (OpenAI) published its full whistleblowing policy publicly +- Existential safety scores show no company above D, but within that low band, practices still vary substantially +- Chinese companies (Zhipu AI, DeepSeek) score significantly lower (F range) than Western companies (C to D range) + +The absence of regulatory minimums means companies face no penalty for scoring F in critical safety dimensions. The competitive pressure to advance capability without equivalent safety investment creates a structural dynamic where the gap between leaders and laggards widens over time. + +FLI explicitly noted that "absence of regulatory floors allows safety practice divergence to widen" as a key finding. The index provides quantitative evidence that voluntary approaches fail to establish industry-wide safety baselines. + +## Evidence + +The 7x spread in overall scores (0.37 to 2.64) and the fact that only 43% of companies conduct dangerous capability testing demonstrates empirically that voluntary commitments do not produce convergence. The index methodology was peer-reviewed and based on publicly available information, making the divergence measurement credible. + +--- + +Relevant Notes: +- [[voluntary safety pledges cannot survive competitive pressure]] — this provides the empirical measurement +- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — divergence is the predicted outcome +- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — regulatory absence creates perverse incentives + +Topics: +- [[domains/ai-alignment/_map]] +- [[core/grand-strategy/_map]] diff --git a/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md b/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md index 0a4e68f4..948f5624 100644 --- a/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md +++ b/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md @@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework. + +### Additional Evidence (extend) +*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +FLI's AI Safety Index evaluates companies across 6 dimensions (Risk Assessment, Current Harms, Safety Frameworks, Existential Safety, Governance & Accountability, Information Sharing) but does not include any dimension for multi-agent coordination, collective intelligence approaches, or distributed alignment mechanisms. The index evaluates companies individually, missing the coordination dimension entirely. This absence in the evaluation framework itself reflects that collective intelligence infrastructure is not considered a standard component of AI safety practice at frontier labs. + --- Relevant Notes: diff --git a/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md new file mode 100644 index 00000000..bf1b7cee --- /dev/null +++ b/domains/ai-alignment/no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md @@ -0,0 +1,37 @@ +--- +type: claim +domain: ai-alignment +description: "FLI's Summer 2025 index shows all frontier AI labs score D or below in existential safety planning while claiming AGI within a decade" +confidence: likely +source: "Future of Life Institute, AI Safety Index Summer 2025 (2025-07-01)" +created: 2026-03-11 +--- + +# No frontier AI company scores above D in existential safety despite active AGI development + +Future of Life Institute's comprehensive evaluation of frontier AI companies across 6 safety dimensions reveals a critical gap: every company scored D or below in existential safety planning, despite most claiming AGI development within a decade. The best overall performer, Anthropic, achieved only C+ (2.64/4.0) overall and D in existential safety. The index evaluated 7 companies: Anthropic (C+, 2.64), OpenAI (C, 2.10), Google DeepMind (C-, 1.76), x.AI (D, 1.23), Meta (D, 1.06), Zhipu AI (F, 0.62), and DeepSeek (F, 0.37). + +The six evaluation dimensions were: Risk Assessment (dangerous capability testing), Current Harms (safety benchmarks and robustness), Safety Frameworks (risk management processes), Existential Safety (planning for human-level AI), Governance & Accountability (whistleblowing and oversight), and Information Sharing (transparency on specs and risks). + +Critical findings: +- Only 3 firms (Anthropic, OpenAI, DeepMind) conduct substantive testing for dangerous capabilities like bioterrorism and cyberattacks +- Only OpenAI published its full whistleblowing policy publicly +- FLI reviewers noted "none of the companies has anything like a coherent, actionable plan" for human-level AI safety +- The disconnect between AGI claims and existential safety scores was described as "deeply disturbing" + +The methodology was peer-reviewed and based on publicly available information plus email correspondence with developers. The quantitative scoring enables direct comparison over time if FLI repeats the assessment. + +## Evidence + +This claim is supported by FLI's systematic evaluation across standardized criteria, providing the first quantitative company-level comparison of AI safety practices. The universal failure to score above D in existential safety, combined with active AGI development programs, provides empirical evidence for the gap between safety rhetoric and practice. + +--- + +Relevant Notes: +- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — this index provides company-level quantification +- [[voluntary safety pledges cannot survive competitive pressure]] — confirmed by best company scoring only C+ +- [[safe AI development requires building alignment mechanisms before scaling capability]] — violated by every company assessed +- [[no research group is building alignment through collective intelligence infrastructure]] — index doesn't evaluate this dimension + +Topics: +- [[domains/ai-alignment/_map]] diff --git a/domains/ai-alignment/safe AI development requires building alignment mechanisms before scaling capability.md b/domains/ai-alignment/safe AI development requires building alignment mechanisms before scaling capability.md index 09030349..4d7c8ef8 100644 --- a/domains/ai-alignment/safe AI development requires building alignment mechanisms before scaling capability.md +++ b/domains/ai-alignment/safe AI development requires building alignment mechanisms before scaling capability.md @@ -21,6 +21,12 @@ This phased approach is also a practical response to the observation that since Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions. + +### Additional Evidence (confirm) +*Source: [[2025-07-00-fli-ai-safety-index-summer-2025]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +FLI's index shows all frontier AI companies violate this principle: every company scores D or below in existential safety while actively pursuing AGI development. The universal failure to score above D in existential safety planning, combined with claims of AGI within a decade, demonstrates that capability scaling is proceeding without corresponding alignment infrastructure. FLI reviewers noted the disconnect between AGI timelines and existential safety preparedness as 'deeply disturbing.' + --- Relevant Notes: diff --git a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md index b74d93f8..5e7506d5 100644 --- a/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md +++ b/inbox/archive/2025-07-00-fli-ai-safety-index-summer-2025.md @@ -7,9 +7,15 @@ date: 2025-07-01 domain: ai-alignment secondary_domains: [grand-strategy] format: report -status: unprocessed +status: processed priority: high tags: [AI-safety, company-scores, accountability, governance, existential-risk, transparency] +processed_by: theseus +processed_date: 2026-03-11 +claims_extracted: ["no-frontier-ai-company-scores-above-d-in-existential-safety-despite-active-agi-development.md", "absence-of-regulatory-floors-allows-ai-safety-practice-divergence-to-widen.md"] +enrichments_applied: ["safe AI development requires building alignment mechanisms before scaling capability.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"] +extraction_model: "anthropic/claude-sonnet-4.5" +extraction_notes: "High-value extraction. Provides quantitative company-level evidence for multiple existing claims about the race to the bottom, voluntary safety failures, and capability-alignment gaps. Two new claims extracted: the universal D-or-below existential safety scores despite AGI claims, and the widening divergence in safety practices absent regulatory floors. Five enrichments applied to existing alignment claims. The index is significant because it's the first systematic quantitative comparison of frontier AI company safety practices with peer-reviewed methodology." --- ## Content @@ -62,3 +68,16 @@ FLI's comprehensive evaluation of frontier AI companies across 6 safety dimensio PRIMARY CONNECTION: [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] WHY ARCHIVED: Provides quantitative company-level evidence for the race-to-the-bottom dynamic — best company scores C+ in overall safety, all companies score D or below in existential safety EXTRACTION HINT: The headline claim is "no frontier AI company scores above D in existential safety despite AGI claims." The company-by-company comparison and the existential safety gap are the highest-value extractions. + + +## Key Facts +- Anthropic: C+ (2.64/4.0) — best overall score +- OpenAI: C (2.10/4.0) — second +- Google DeepMind: C- (1.76/4.0) — third +- x.AI: D (1.23/4.0) +- Meta: D (1.06/4.0) +- Zhipu AI: F (0.62/4.0) +- DeepSeek: F (0.37/4.0) +- Only 3 of 7 companies conduct substantive dangerous capability testing (bioterrorism, cyberattacks) +- Only OpenAI published its full whistleblowing policy publicly +- FLI methodology was peer-reviewed and based on publicly available information plus email correspondence