5.6 KiB
| type | title | author | url | date | domain | secondary_domains | format | status | priority | tags | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| source | Anthropic's RSP v3.0: How It Works, What's Changed, and Some Reflections | GovAI (Centre for the Governance of AI) | https://www.governance.ai/analysis/anthropics-rsp-v3-0-how-it-works-whats-changed-and-some-reflections | 2026-02-28 | ai-alignment | article | unprocessed | medium |
|
Content
GovAI's systematic analysis of what changed between RSP v2.2 and RSP v3.0 (effective February 24, 2026).
What was removed or weakened:
-
Pause commitment removed entirely — Previously: Anthropic would not "train or deploy models capable of causing catastrophic harm unless" adequate mitigations existed. RSP v3.0 drops this; justification given is that unilateral pauses are ineffective when competitors continue.
-
RAND Security Level 4 protections downgraded — State-level model weight theft protection moved from binding commitment to "industry-wide recommendation." GovAI notes: "a meaningful weakening of security obligations."
-
Escalating ASL tier requirements eliminated — Old RSP specified requirements for two capability levels ahead; v3.0 only addresses the next level, framed as avoiding "overly rigid" planning.
-
AI R&D threshold affirmative case removed — The commitment to produce an "affirmative case" for safety at the AI R&D 4 threshold was dropped; Risk Reports may partially substitute.
-
Cyber operations and radiological/nuclear removed from binding commitments — GovAI analysis: no explanation provided by Anthropic. Speculation: "may reflect an updated view that these risks are unlikely to result in catastrophic harm." GovAI offers no alternative explanation.
What was added (genuine progress):
- Frontier Safety Roadmap — Mandatory public roadmap with ~quarterly updates
- Periodic Risk Reports — Every 3-6 months
- "Interpretability-informed alignment assessment" by October 2026 — Mechanistic interpretability + adversarial red-teaming incorporated into formal alignment threshold evaluation
- Explicit unilateral vs. recommendation separation — Clearer structure distinguishing binding from aspirational
GovAI's overall assessment: RSP v3.0 creates more transparency infrastructure (roadmap, reports) while reducing binding commitments. The tradeoff between transparency without binding constraints producing accountability is unresolved.
The cyber/CBRN removal context: GovAI provides no explanation from Anthropic. The timing (February 24, three days before the public Anthropic-Pentagon confrontation) suggests the removals are not a direct response to Pentagon pressure — they may reflect a different risk assessment, or a shift in what Anthropic thinks binding commitments should cover.
Agent Notes
Why this matters: GovAI's systematic analysis is the authoritative comparison of RSP v2.2 and v3.0. Their finding that cyber/CBRN were removed without explanation — combined with the broader weakening of binding commitments — is the primary evidence for the "RSP v3.0 weakening" thesis from session 15.
What surprised me: The absence of any explanation from Anthropic for the cyber/CBRN removals, even in response to GovAI's analysis. Given Anthropic's public emphasis on transparency (Frontier Safety Roadmap, Risk Reports), the silence on the most consequential removals is notable. It either reflects a deliberate choice not to explain, or the removals weren't considered significant enough to warrant explanation.
What I expected but didn't find: Any Anthropic-published rationale for the specific removals. RSP v3.0 itself presumably contains language about scope, but GovAI's analysis suggests that language doesn't explain why these domains were removed from binding commitments specifically.
KB connections: voluntary-pledges-fail-under-competition — the pause removal is direct evidence; institutional-gap — the binding→recommendation demotion widens the gap; verification-degrades-faster-than-capability-grows — the interpretability commitment is the proposed countermeasure.
Extraction hints: The most useful claim from this source is about the transparency-vs-binding tradeoff in RSP v3.0: transparency infrastructure (roadmap, reports) increased while binding commitments decreased. This is a specific governance architecture pattern — public accountability without enforcement. Whether transparency without binding constraints produces genuine accountability is an empirical question the KB could track.
Context: GovAI is the leading academic organization analyzing frontier AI safety governance. Their analysis is authoritative and widely cited in the AI safety community. The "reflections" portion of their analysis represents considered institutional views, not just factual reporting.
Curator Notes (structured handoff for extractor)
PRIMARY CONNECTION: voluntary-pledges-fail-under-competition — pause removal is the clearest evidence; transparency-binding tradeoff is the new governance pattern to track WHY ARCHIVED: GovAI's analysis is the authoritative RSP v3.0 change log; the cyber/CBRN removal without explanation is the key unexplained governance fact EXTRACTION HINT: Focus on the transparency-without-binding-constraints pattern as a new KB claim — RSP v3.0 increases public accountability infrastructure (roadmaps, reports) while decreasing binding safety obligations, making it a test case for whether transparency without enforcement produces safety outcomes.