2026-03-06 20:37:02 +00:00
1 changed files with 55 additions and 0 deletions
--- a/core/grand-strategy/voluntary
+++ b/core/grand-strategy/voluntary
@ -0,0 +1,55 @@
 ---
 type: claim
 domain: grand-strategy
 secondary_domains:
  - ai-alignment
  - mechanisms
 description: "The RSP collapse, alignment tax dynamics, and futarchy's manipulation resistance form a triangle: voluntary commitments fail predictably, competitive dynamics explain why, and coordination mechanisms offer the structural alternative that unilateral pledges cannot provide."
 confidence: experimental
 source: "Leo synthesis — connecting Anthropic RSP collapse (Feb 2026), alignment tax race-to-bottom dynamics, and futarchy mechanism design"
 created: 2026-03-06
 ---
 # Voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot
 The pattern is now empirically confirmed: Anthropic's Responsible Scaling Policy — the most concrete voluntary safety commitment in AI — was dropped in February 2026 after the Pentagon designated safety-conscious labs as supply chain risks. This was not a failure of intentions but a structural result.
 ## The triangle
 Three claims in the knowledge base independently converge on the same mechanism:
 1. **Voluntary commitments fail.** [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] documents the structural inevitability. Unilateral safety costs capability. Competitors who skip safety gain relative advantage. The commitment holder faces a choice between maintaining the pledge and maintaining competitive position. Anthropic chose competitive position.
 2. **Competitive dynamics explain why.** [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] provides the mechanism. Safety is a tax on capability. In a competitive market, taxes that competitors don't pay are unsustainable. This isn't a moral failure — it's the same logic that makes unilateral tariff reduction unstable in trade theory. The alignment tax is a coordination problem wearing a technical mask.
 3. **Government action accelerates collapse.** [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] shows the feedback loop closing. When the entity that should enforce safety instead punishes it, the coordination problem becomes strictly harder. The Pentagon's designation didn't just remove the floor — it actively penalized being on the floor.
 ## Why coordination mechanisms are the structural alternative
 The voluntary commitment fails because defection is individually rational and enforcement is absent. This is precisely the structure that [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] addresses. In a futarchy-governed safety regime:
 - Safety commitments would be priced into conditional markets, not declared unilaterally
 - Defection would be costly because markets would immediately reprice the defector's token
 - The coordination problem dissolves because the mechanism aligns individual incentives with collective outcomes
 The key insight is not that futarchy solves alignment — it's that **the RSP collapse demonstrates the class of problem** (voluntary commitment under competitive pressure) **for which coordination mechanisms exist**. The alignment field has been treating safety as a technical problem of model behavior while the actual failure mode is a coordination problem of institutional behavior.
 ## Cross-domain pattern
 This is an instance of [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — but with a crucial difference. COVID coordination failed because no binding mechanism existed. AI safety coordination fails despite the mechanism design literature providing candidates. The gap is implementation, not theory.
 The [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] claim explains why the field hasn't closed this gap: improving single-model safety is locally productive, so resources flow there rather than to coordination infrastructure that would make safety commitments bindable.
 ---
 Relevant Notes:
 - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — empirical confirmation (RSP collapse)
 - [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — mechanism
 - [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] — feedback loop
 - [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] — coordination alternative
 - [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] — resource misallocation
 - [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — pattern match
 - [[AI alignment is a coordination problem not a technical problem]] — parent claim
 Topics:
 - [[_map]]