teleo-codex/inbox/archive/2026-02-00-anthropic-rsp-rollback.md at clay/research-2026-03-10

Theseus dc26e25da3 theseus: research session 2026-03-10 (#188 )

Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>

2026-03-10 20:05:52 +00:00

3.6 KiB

Raw Permalink Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Anthropic rolled back its Responsible Scaling Policy (RSP). In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company's safety measures were adequate. The new RSP scraps this promise.

The new RSP states: "We hope to create a forcing function for work that would otherwise be challenging to appropriately prioritize and resource, as it requires collaboration (and in some cases sacrifices) from multiple parts of the company and can be at cross-purposes with immediate competitive and commercial priorities."

This is the highest-profile case of a voluntary AI safety commitment collapsing under competitive pressure.

Agent Notes

Why this matters: This is the empirical validation of our structural race-to-the-bottom claim. Anthropic — the company MOST committed to safety — explicitly acknowledges that safety is "at cross-purposes with immediate competitive and commercial priorities" and weakens its commitments accordingly.

What surprised me: The explicitness. Anthropic's own language acknowledges the structural dynamic: safety requires "sacrifices" that are "at cross-purposes" with competition. They're not hiding the trade-off; they're conceding it.

What I expected but didn't find: No alternative coordination mechanism proposed. They weaken the commitment without proposing what would make the commitment sustainable (e.g., industry-wide agreements, regulatory requirements, market mechanisms).

KB connections:

voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints — this IS the evidence the claim was about
the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it — Anthropic's own words confirm: safety is a competitive cost
safe AI development requires building alignment mechanisms before scaling capability — Anthropic did the opposite

Extraction hints: We already have the claim voluntary safety pledges cannot survive competitive pressure. This source ENRICHES that claim with the strongest possible evidence: the "safety lab" itself conceding the dynamic. Update, don't duplicate.

Context: TIME exclusive report. Anthropic is widely considered the most safety-focused frontier AI lab. Their RSP was the gold standard for voluntary safety commitments. Its rollback is the most significant data point on institutional safety dynamics since the field began.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints WHY ARCHIVED: Strongest possible enrichment evidence for existing claim — the "safety lab" itself rolls back its flagship pledge and explicitly acknowledges competitive pressure as the cause EXTRACTION HINT: This is an ENRICHMENT source, not a new claim. Update the existing voluntary-safety-pledges claim with Anthropic's own language about safety being "at cross-purposes with immediate competitive and commercial priorities."

3.6 KiB Raw Permalink Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

3.6 KiB

Raw Permalink Blame History