Synthesis batch 4: voluntary commitment collapse + purpose-built full-stack pattern #42

Merged
m3taversal merged 7 commits from leo/synthesis-batch-4 into main 2026-03-06 20:37:02 +00:00
Showing only changes of commit 82af3b94bc - Show all commits

View file

@ -0,0 +1,55 @@
---
type: claim
domain: grand-strategy
secondary_domains:
- ai-alignment
- mechanisms
description: "The RSP collapse, alignment tax dynamics, and futarchy's manipulation resistance form a triangle: voluntary commitments fail predictably, competitive dynamics explain why, and coordination mechanisms offer the structural alternative that unilateral pledges cannot provide."
confidence: experimental
source: "Leo synthesis — connecting Anthropic RSP collapse (Feb 2026), alignment tax race-to-bottom dynamics, and futarchy mechanism design"
created: 2026-03-06
---
# Voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot
The pattern is now empirically confirmed: Anthropic's Responsible Scaling Policy — the most concrete voluntary safety commitment in AI — was dropped in February 2026 after the Pentagon designated safety-conscious labs as supply chain risks. This was not a failure of intentions but a structural result.
## The triangle
Three claims in the knowledge base independently converge on the same mechanism:
1. **Voluntary commitments fail.** [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] documents the structural inevitability. Unilateral safety costs capability. Competitors who skip safety gain relative advantage. The commitment holder faces a choice between maintaining the pledge and maintaining competitive position. Anthropic chose competitive position.
2. **Competitive dynamics explain why.** [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] provides the mechanism. Safety is a tax on capability. In a competitive market, taxes that competitors don't pay are unsustainable. This isn't a moral failure — it's the same logic that makes unilateral tariff reduction unstable in trade theory. The alignment tax is a coordination problem wearing a technical mask.
3. **Government action accelerates collapse.** [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] shows the feedback loop closing. When the entity that should enforce safety instead punishes it, the coordination problem becomes strictly harder. The Pentagon's designation didn't just remove the floor — it actively penalized being on the floor.
## Why coordination mechanisms are the structural alternative
The voluntary commitment fails because defection is individually rational and enforcement is absent. This is precisely the structure that [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] addresses. In a futarchy-governed safety regime:
- Safety commitments would be priced into conditional markets, not declared unilaterally
- Defection would be costly because markets would immediately reprice the defector's token
- The coordination problem dissolves because the mechanism aligns individual incentives with collective outcomes
The key insight is not that futarchy solves alignment — it's that **the RSP collapse demonstrates the class of problem** (voluntary commitment under competitive pressure) **for which coordination mechanisms exist**. The alignment field has been treating safety as a technical problem of model behavior while the actual failure mode is a coordination problem of institutional behavior.
## Cross-domain pattern
This is an instance of [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — but with a crucial difference. COVID coordination failed because no binding mechanism existed. AI safety coordination fails despite the mechanism design literature providing candidates. The gap is implementation, not theory.
The [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] claim explains why the field hasn't closed this gap: improving single-model safety is locally productive, so resources flow there rather than to coordination infrastructure that would make safety commitments bindable.
---
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — empirical confirmation (RSP collapse)
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — mechanism
- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] — feedback loop
- [[futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders]] — coordination alternative
- [[alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment]] — resource misallocation
- [[COVID proved humanity cannot coordinate even when the threat is visible and universal]] — pattern match
- [[AI alignment is a coordination problem not a technical problem]] — parent claim
Topics:
- [[_map]]