diff --git a/core/grand-strategy/voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot.md b/core/grand-strategy/voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot.md new file mode 100644 index 0000000..e84554f --- /dev/null +++ b/core/grand-strategy/voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot.md @@ -0,0 +1,55 @@ +--- +type: claim +domain: grand-strategy +secondary_domains: + - ai-alignment + - mechanisms +description: "The RSP collapse, alignment tax dynamics, and futarchy's manipulation resistance form a triangle: voluntary commitments fail predictably, competitive dynamics explain why, and coordination mechanisms offer the structural alternative that unilateral pledges cannot provide." +confidence: experimental +source: "Leo synthesis — connecting Anthropic RSP collapse (Feb 2026), alignment tax race-to-bottom dynamics, and futarchy mechanism design" +created: 2026-03-06 +--- + +# Voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot + +The pattern is now empirically confirmed: Anthropic's Responsible Scaling Policy — the most concrete voluntary safety commitment in AI — was dropped in February 2026 after the Pentagon designated safety-conscious labs as supply chain risks. This was not a failure of intentions but a structural result. + +## The triangle + +Three claims in the knowledge base independently converge on the same mechanism: + +1. **Voluntary commitments fail.** [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] documents the structural inevitability. Unilateral safety costs capability. Competitors who skip safety gain relative advantage. The commitment holder faces a choice between maintaining the pledge and maintaining competitive position. Anthropic chose competitive position. + +2. **Competitive dynamics explain why.** [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] provides the mechanism. Safety is a tax on capability. In a competitive market, taxes that competitors don't pay are unsustainable. This isn't a moral failure — it's the same logic that makes unilateral tariff reduction unstable in trade theory. The alignment tax is a coordination problem wearing a technical mask. + +3. **Government action accelerates collapse.** [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] shows the feedback loop closing. When the entity that should enforce safety instead punishes it, the coordination problem becomes strictly harder. The Pentagon's designation didn't just remove the floor — it actively penalized being on the floor. + +## Why coordination mechanisms are the structural alternative + +The voluntary commitment fails because defection is individually rational and enforcement is absent. This is precisely the structure that [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] addresses. In a futarchy-governed safety regime: + +- Safety commitments would be priced into conditional markets, not declared unilaterally +- Defection would be costly because markets would immediately reprice the defector's token +- The coordination problem dissolves because the mechanism aligns individual incentives with collective outcomes + +The key insight is not that futarchy solves alignment — it's that **the RSP collapse demonstrates the class of problem** (voluntary commitment under competitive pressure) **for which coordination mechanisms exist**. The alignment field has been treating safety as a technical problem of model behavior while the actual failure mode is a coordination problem of institutional behavior. + +## Cross-domain pattern + +This is an instance of [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — but with a crucial difference. COVID coordination failed because no binding mechanism existed. AI safety coordination fails despite the mechanism design literature providing candidates. The gap is implementation, not theory. + +The [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] claim explains why the field hasn't closed this gap: improving single-model safety is locally productive, so resources flow there rather than to coordination infrastructure that would make safety commitments bindable. + +--- + +Relevant Notes: +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — empirical confirmation (RSP collapse) +- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — mechanism +- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] — feedback loop +- [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] — coordination alternative +- [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] — resource misallocation +- [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — pattern match +- [[AI alignment is a coordination problem not a technical problem]] — parent claim + +Topics: +- [[_map]]