theseus: AI coordination governance evidence — 3 claims + 1 entity

- What: 3 claims on coordination governance empirics (binding regulation as
  only mechanism that works, transparency declining, compute export controls
  as misaligned governance) + UK AISI entity + comprehensive source archive
- Why: targeted research on weakest grounding of B2 ("alignment is coordination
  problem"). Found that voluntary coordination has empirically failed across
  every mechanism tested (2023-2026). Only binding regulation with enforcement
  changes behavior. This challenges the optimistic version of B2 and
  strengthens the case for enforcement-backed coordination.
- Connections: confirms voluntary-safety-pledge claim with extensive new
  evidence, strengthens nation-state-control claim, challenges alignment-tax
  claim by showing the tax is being cut not paid

Pentagon-Agent: Theseus <B4A5B354-03D6-4291-A6A8-1E04A879D9AC>
This commit is contained in:
m3taversal 2026-03-16 19:33:24 +00:00
parent b64fe64b89
commit 519be36f90
5 changed files with 222 additions and 0 deletions

View file

@ -0,0 +1,40 @@
---
type: claim
domain: ai-alignment
description: "Quantitative evidence from Stanford's Foundation Model Transparency Index shows frontier AI transparency actively worsening from 2024-2025, contradicting the narrative that governance pressure increases disclosure"
confidence: likely
source: "Stanford CRFM Foundation Model Transparency Index (Dec 2025), FLI AI Safety Index (Summer 2025), OpenAI mission statement change (Fortune, Nov 2025), OpenAI team dissolutions (May 2024, Feb 2026)"
created: 2026-03-16
---
# AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements
Stanford's Foundation Model Transparency Index (FMTI), the most rigorous quantitative measure of AI lab disclosure practices, documented a decline in transparency from 2024 to 2025:
- **Mean score dropped 17 points** across all tracked labs
- **Meta**: -29 points (largest decline, coinciding with pivot from open-source to closed)
- **Mistral**: -37 points
- **OpenAI**: -14 points
- No company scored above C+ on FLI's AI Safety Index
This decline occurred despite: the Seoul AI Safety Commitments (May 2024) in which 16 companies promised to publish safety frameworks, the White House voluntary commitments (Jul 2023) which included transparency pledges, and multiple international declarations calling for AI transparency.
The organizational signals are consistent with the quantitative decline:
- OpenAI dissolved its Superalignment team (May 2024) and Mission Alignment team (Feb 2026)
- OpenAI removed the word "safely" from its mission statement in its November 2025 IRS filing
- OpenAI's Preparedness Framework v2 dropped manipulation and mass disinformation as risk categories worth testing before model release
- Google DeepMind released Gemini 2.5 Pro without the external evaluation and detailed safety report promised under Seoul commitments
This evidence directly challenges the theory that governance pressure (declarations, voluntary commitments, safety institute creation) increases transparency over time. The opposite is occurring: as models become more capable and commercially valuable, labs are becoming less transparent about their safety practices, not more.
The alignment implication: transparency is a prerequisite for external oversight. If [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]], declining transparency makes even the unreliable evaluations harder to conduct. The governance mechanisms that could provide oversight (safety institutes, third-party auditors) depend on lab cooperation that is actively eroding.
---
Relevant Notes:
- [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]] — declining transparency compounds the evaluation problem
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — transparency commitments follow the same erosion lifecycle
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — transparency has a cost; labs are cutting it
Topics:
- [[_map]]

View file

@ -0,0 +1,35 @@
---
type: claim
domain: ai-alignment
description: "US AI chip export controls have verifiably changed corporate behavior (Nvidia designing compliance chips, data center relocations, sovereign compute strategies) but target geopolitical competition not AI safety, leaving a governance vacuum for how safely frontier capability is developed"
confidence: likely
source: "US export control regulations (Oct 2022, Oct 2023, Dec 2024, Jan 2025), Nvidia compliance chip design reports, sovereign compute strategy announcements; theseus AI coordination research (Mar 2026)"
created: 2026-03-16
---
# compute export controls are the most impactful AI governance mechanism but target geopolitical competition not safety leaving capability development unconstrained
US export controls on AI chips represent the most consequential AI governance mechanism by a wide margin. Iteratively tightened across four rounds (October 2022, October 2023, December 2024, January 2025) and partially loosened under the Trump administration, these controls have produced verified behavioral changes across the industry:
- Nvidia designed compliance-specific chips to meet tiered restrictions
- Companies altered data center location decisions based on export tiers
- Nations launched sovereign compute strategies (EU, Gulf states, Japan) partly in response to supply uncertainty
- Tiered country classification systems created deployment caps (100k-320k H100-equivalents) that constrain compute access by geography
No voluntary commitment, international declaration, or industry self-regulation effort has produced behavioral change at this scale. Export controls work because they are backed by state enforcement authority and carry criminal penalties for violation.
**The governance gap:** Export controls constrain who can build frontier AI (capability distribution) but say nothing about how safely it is built (capability development). The US government restricts chip sales to adversary nations while simultaneously eliminating domestic safety requirements — Trump revoked Biden's EO 14110 on Day 1, removing the reporting requirements that were the closest US equivalent to binding safety governance.
This creates a structural asymmetry: the most effective governance mechanism addresses geopolitical competition while leaving safety governance to voluntary mechanisms that have empirically failed. The labs that CAN access frontier compute (US companies, allies) face no binding safety requirements, while the labs that CANNOT access it (China, restricted nations) face capability limitations but develop workarounds (DeepSeek trained R1 for ~$6M using efficiency innovations partly driven by compute constraints).
For alignment, this means the governance infrastructure that exists (export controls) is misaligned with the governance infrastructure that's needed (safety requirements). The state has demonstrated it CAN govern AI development through binding mechanisms — it chooses to govern distribution, not safety.
---
Relevant Notes:
- [[nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments]] — export controls confirm state capability; the question is what states choose to govern
- [[only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient]] — export controls are the paradigm case of binding governance working
- [[AI alignment is a coordination problem not a technical problem]] — export controls show coordination with enforcement works; the problem is that enforcement is aimed at competition, not safety
Topics:
- [[_map]]

View file

@ -0,0 +1,48 @@
---
type: claim
domain: ai-alignment
description: "Comprehensive review of AI governance mechanisms (2023-2026) shows only the EU AI Act, China's AI regulations, and US export controls produced verified behavioral change at frontier labs — all voluntary mechanisms failed"
confidence: likely
source: "Stanford FMTI (Dec 2025), EU enforcement actions (2025), TIME/CNN on Anthropic RSP (Feb 2026), TechCrunch on OpenAI Preparedness Framework (Apr 2025), Fortune on Seoul violations (Aug 2025), Brookings analysis, OECD reports; theseus AI coordination research (Mar 2026)"
created: 2026-03-16
---
# only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient
A comprehensive review of every major AI governance mechanism from 2023-2026 reveals a clear empirical pattern: only binding regulation with enforcement authority has produced verified behavioral change at frontier AI labs.
**What changed behavior (Tier 1):**
The EU AI Act caused Apple to pause Apple Intelligence rollout in the EU, Meta to change advertising settings for EU users, and multiple companies to preemptively modify products for compliance. EUR 500M+ in fines have been levied under related digital regulation. This is the only Western governance mechanism with verified behavioral change at frontier labs.
China's AI regulations — mandatory algorithm filing, content labeling, criminal enforcement for AI-generated misinformation — produced compliance from every company operating in the Chinese market. China was the first country with binding generative AI regulation (August 2023).
US export controls on AI chips are the most consequential AI governance mechanism operating today, constraining which actors can access frontier compute. Nvidia designed compliance-specific chips in response. But these controls are geopolitically motivated, not safety-motivated.
**What did NOT change behavior (Tier 4):**
Every international declaration — Bletchley (29 countries, Nov 2023), Seoul (16 companies, May 2024), Hiroshima (G7), Paris (Feb 2025), OECD principles (46 countries) — produced zero documented cases of a lab changing behavior. The Bletchley Declaration catalyzed safety institute creation (real institutional infrastructure), but no lab delayed, modified, or cancelled a model release because of any declaration.
The White House voluntary commitments (15 companies, July 2023) were partially implemented (watermarking at 38% of generators) but transparency actively declined: Stanford's Foundation Model Transparency Index mean score dropped 17 points from 2024 to 2025. Meta fell 29 points, Mistral fell 37 points, OpenAI fell 14 points.
**The erosion lifecycle:**
Voluntary safety commitments follow a predictable trajectory: announced with fanfare → partially implemented → eroded under competitive pressure → made conditional on competitors → abandoned. The documented cases:
1. Anthropic's RSP (2023→2026): binding commitment → abandoned, replaced with nonbinding framework. Anthropic's own explanation: "very hard to meet without industry-wide coordination."
2. OpenAI's Preparedness Framework v2 (Apr 2025): explicitly states OpenAI "may adjust its safety requirements if a rival lab releases a high-risk system without similar protections." Safety is now contractually conditional on competitor behavior.
3. OpenAI's safety infrastructure: Superalignment team dissolved (May 2024), Mission Alignment team dissolved (Feb 2026), "safely" removed from mission statement (Nov 2025).
4. Google's Seoul commitment: 60 UK lawmakers accused Google DeepMind of violating its Seoul safety reporting commitment when Gemini 2.5 Pro was released without promised external evaluation (Apr 2025).
This pattern confirms [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] with far more evidence than previously available. It also implies that [[AI alignment is a coordination problem not a technical problem]] is correct in diagnosis but insufficient as a solution — coordination through voluntary mechanisms has empirically failed. The question becomes: what coordination mechanisms have enforcement authority without requiring state coercion?
---
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms
- [[AI alignment is a coordination problem not a technical problem]] — correct diagnosis, but voluntary coordination has failed; enforcement-backed coordination is the only kind that works
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — the erosion lifecycle is the alignment tax in action
- [[nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments]] — export controls and the EU AI Act confirm state power is the binding governance mechanism
Topics:
- [[_map]]

View file

@ -0,0 +1,46 @@
---
type: entity
entity_type: governance_body
name: "UK AI Safety Institute"
domain: ai-alignment
handles: ["@AISafetyInst"]
website: https://www.aisi.gov.uk
status: active
category: "Government AI safety evaluation body"
key_metrics:
pre_deployment_evals: "Conducted joint US-UK evaluation of OpenAI o1 (Dec 2024)"
frontier_report: "Published Frontier AI Trends Report showing apprentice-level cyber task completion at 50%"
blocking_authority: "None — labs grant voluntary access and retain full release authority"
tracked_by: theseus
created: 2026-03-16
last_updated: 2026-03-16
---
# UK AI Safety Institute
## Overview
The first government-established AI safety evaluation body, created after the Bletchley Summit (November 2023). Conducted the most concrete bilateral safety cooperation to date (joint US-UK evaluation of OpenAI's o1, December 2024). Rebranded to "AI Security Institute" in February 2025, signaling an emphasis shift from safety to security.
## Current State
- Conducted pre-deployment evaluations of multiple frontier models
- Published Frontier AI Trends Report: AI models now complete apprentice-level cyber tasks 50% of the time (up from 10% in early 2024), surpass PhD-level experts in chemistry/biology by up to 60%
- Key finding: Model B (released 6 months after Model A) required ~40x more expert effort to find universal attacks in biological misuse
- No blocking authority — labs participate voluntarily and retain full control over release decisions
## Timeline
- **2023-11** — Created after Bletchley Summit
- **2024-04** — US-UK MOU signed for joint model testing, research sharing, personnel exchanges
- **2024-12** — Joint pre-deployment evaluation of OpenAI o1 with US AISI
- **2025-02** — Rebranded to "AI Security Institute"
## Alignment Significance
The UK AISI is the strongest evidence that institutional infrastructure CAN be created from international coordination — but also the strongest evidence that institutional infrastructure without enforcement authority has limited impact. Labs grant access voluntarily. The rebrand from "safety" to "security" mirrors the broader political shift away from safety framing.
The US counterpart (AISI → CAISI) has been defunded and rebranded under the Trump administration, demonstrating the fragility of institutions that depend on executive branch support rather than legislative mandate.
## Relationship to KB
- [[only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient]] — AISI is Tier 2 infrastructure: real but without enforcement
- [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]] — AISI's own data shows models distinguish test from deployment settings
Topics:
- [[_map]]

View file

@ -0,0 +1,53 @@
---
type: source
title: "Empirical Evidence: AI Coordination and Governance Mechanisms That Changed Behavior"
author: "Theseus research agent (multi-source web synthesis)"
url: null
date_published: 2026-03-16
date_archived: 2026-03-16
domain: ai-alignment
status: processing
processed_by: theseus
tags: [ai-governance, coordination, safety-commitments, regulation, enforcement, voluntary-pledges]
sourced_via: "Theseus research agent — 45 web searches synthesized from Brookings, Stanford FMTI, EU legislation, OECD, government publications, TechCrunch, TIME, CNN, Fortune, academic papers"
---
# Empirical Evidence: AI Coordination and Governance Mechanisms That Changed Behavior
Core finding: almost no international AI governance mechanism has produced verified behavioral change at frontier AI labs. Only three mechanisms work: (1) binding regulation with enforcement teeth (EU AI Act, China), (2) export controls backed by state power, (3) competitive/reputational pressure through markets.
## Behavioral Change Tier List
**Tier 1 — Verified behavioral change:**
- EU AI Act: Apple paused Apple Intelligence in EU, Meta changed ads, EUR 500M+ fines (DMA). Companies preemptively modifying products.
- China's AI regulations: mandatory algorithm filing, content labeling, criminal enforcement. First binding generative AI regulation (Aug 2023).
- US export controls: most impactful mechanism. Tiered country system, deployment caps, Nvidia designing compliance chips. Geopolitically motivated, not safety-motivated.
**Tier 2 — Institutional infrastructure, uncertain behavioral change:**
- AI Safety Institutes (UK, US, Japan, Korea, Canada). US-UK joint o1 evaluation. But no blocking authority, US AISI defunded/rebranded.
- Third-party evaluation (METR, Apollo Research). Fragile, no regulatory mandate.
**Tier 3 — Partial voluntary compliance:**
- Watermarking: 38% implementation. Google SynthID, Meta AudioSeal. Anthropic the only major lab without one.
- Red-teaming: self-reported, limited external verification.
**Tier 4 — No verified behavioral change:**
- ALL international declarations (Bletchley, Seoul, Paris, Hiroshima, OECD, UN)
- Frontier Model Forum
- White House voluntary commitments
## Key Evidence Points
- Stanford FMTI transparency scores DECLINING: -17 points mean (2024→2025). Meta -29, Mistral -37, OpenAI -14.
- OpenAI explicitly made safety conditional on competitor behavior (Preparedness Framework v2, Apr 2025).
- OpenAI removed "safely" from mission statement (Nov 2025).
- OpenAI dissolved Superalignment team (May 2024) and Mission Alignment team (Feb 2026).
- Google accused by 60 UK lawmakers of violating Seoul commitments (Gemini 2.5 Pro, Apr 2025).
- 450+ organizations lobbied on AI in 2025 (up from 6 in 2016). $92M in lobbying fees Q1-Q3 2025.
- SB 1047 (CA AI safety bill) vetoed after heavy industry lobbying.
- Anthropic's own language: RSP "very hard to meet without industry-wide coordination."
## Novel Mechanisms
- Compute governance: export controls work but geopolitically motivated. KYC for compute proposed, not implemented.
- Insurance/liability: market projected $29.7B by 2033. Creates market incentives aligned with safety.
- Third-party auditing: METR, Apollo Research. Apollo warns ecosystem unsustainable without regulatory mandate.
- Futarchy: implemented for DAO governance (MetaDAO, Optimism experiment) but not yet for AI governance.