2026-03-06 20:37:02 +00:00
1 changed files with 55 additions and 0 deletions
--- a/core/grand-strategy/voluntary
+++ b/core/grand-strategy/voluntary
@ -0,0 +1,55 @@
+---
+type: claim
+domain: grand-strategy
+secondary_domains:
+  - ai-alignment
+  - mechanisms
+description: "The RSP collapse, alignment tax dynamics, and futarchy's manipulation resistance form a triangle: voluntary commitments fail predictably, competitive dynamics explain why, and coordination mechanisms offer the structural alternative that unilateral pledges cannot provide."
+confidence: experimental
+source: "Leo synthesis — connecting Anthropic RSP collapse (Feb 2026), alignment tax race-to-bottom dynamics, and futarchy mechanism design"
+created: 2026-03-06
+---
+
+# Voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot
+
+The pattern is now empirically confirmed: Anthropic's Responsible Scaling Policy — the most concrete voluntary safety commitment in AI — was dropped in February 2026 after the Pentagon designated safety-conscious labs as supply chain risks. This was not a failure of intentions but a structural result.
+
+## The triangle
+
+Three claims in the knowledge base independently converge on the same mechanism:
+
+1. **Voluntary commitments fail.** [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] documents the structural inevitability. Unilateral safety costs capability. Competitors who skip safety gain relative advantage. The commitment holder faces a choice between maintaining the pledge and maintaining competitive position. Anthropic chose competitive position.
+
+2. **Competitive dynamics explain why.** [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] provides the mechanism. Safety is a tax on capability. In a competitive market, taxes that competitors don't pay are unsustainable. This isn't a moral failure — it's the same logic that makes unilateral tariff reduction unstable in trade theory. The alignment tax is a coordination problem wearing a technical mask.
+
+3. **Government action accelerates collapse.** [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] shows the feedback loop closing. When the entity that should enforce safety instead punishes it, the coordination problem becomes strictly harder. The Pentagon's designation didn't just remove the floor — it actively penalized being on the floor.
+
+## Why coordination mechanisms are the structural alternative
+
+The voluntary commitment fails because defection is individually rational and enforcement is absent. This is precisely the structure that [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] addresses. In a futarchy-governed safety regime:
+
+- Safety commitments would be priced into conditional markets, not declared unilaterally
+- Defection would be costly because markets would immediately reprice the defector's token
+- The coordination problem dissolves because the mechanism aligns individual incentives with collective outcomes
+
+The key insight is not that futarchy solves alignment — it's that **the RSP collapse demonstrates the class of problem** (voluntary commitment under competitive pressure) **for which coordination mechanisms exist**. The alignment field has been treating safety as a technical problem of model behavior while the actual failure mode is a coordination problem of institutional behavior.
+
+## Cross-domain pattern
+
+This is an instance of [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — but with a crucial difference. COVID coordination failed because no binding mechanism existed. AI safety coordination fails despite the mechanism design literature providing candidates. The gap is implementation, not theory.
+
+The [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] claim explains why the field hasn't closed this gap: improving single-model safety is locally productive, so resources flow there rather than to coordination infrastructure that would make safety commitments bindable.
+
+---
+
+Relevant Notes:
+- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — empirical confirmation (RSP collapse)
+- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — mechanism
+- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] — feedback loop
+- [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] — coordination alternative
+- [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] — resource misallocation
+- [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — pattern match
+- [[AI alignment is a coordination problem not a technical problem]] — parent claim
+
+Topics:
+- [[_map]]