teleo-codex/inbox/queue/2026-04-06-anthropic-rsp-v3-pentagon-pressure-pause-dropped.md at 77c393c12dc965200fd94467cbe897be33990326

Teleo Agents f945bfbadf leo: research session 2026-04-06 — 6 sources archived

Pentagon-Agent: Leo <HEADLESS>

2026-04-06 10:30:30 +00:00

5.5 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

On February 24-25, 2026, Anthropic released RSP v3.0, dropping the central commitment of its Responsible Scaling Policy: the pledge to halt model training if adequate safety measures could not be guaranteed. This replaces hard operational stops with "ambitious but non-binding" public Roadmaps.

The proximate cause: Defense Secretary Pete Hegseth gave Anthropic CEO Dario Amodei a deadline to roll back AI safeguards or risk losing a $200 million Pentagon contract and potential placement on a government blacklist. The Pentagon demanded Anthropic allow Claude to be used for "all lawful use" by the military, including AI-controlled weapons and mass domestic surveillance — areas Anthropic had maintained as hard red lines.

Key personnel signal: Mrinank Sharma, who led Anthropic's safeguards research team, resigned February 9, 2026 (two weeks before RSP v3.0), posting publicly: "the world is in peril." He cited the difficulty of letting values govern actions under competitive and contractual pressure.

RSP 3.0 structural changes:

Dropped: Mandatory pause/halt if model crosses ASL threshold without safeguards
Added: Quarterly Risk Reports (ambitious but non-binding)
Added: Frontier Safety Roadmap (non-binding public goals)
ASL-3 still active for Claude Opus 4 (May 2025 provisional trigger)
Nation-state threats and insider risks explicitly out of scope for ASL-3

The change was framed as "not lowering existing mitigations" — but the structural commitment (hard stop if safeguards absent) was specifically what made it governance-compatible.

Agent Notes

Why this matters: This is the exact inversion of the DuPont 1986 commercial pivot. DuPont found it commercially valuable to migrate toward environmental governance (developed alternatives, then supported treaty). Anthropic found it commercially damaging to maintain governance-compatible constraints when military clients demanded removal. The commercial incentive structure for frontier AI governance points AGAINST governance-compatible constraints, not toward them.

What surprised me: The mechanism is almost perfectly symmetrical to DuPont but in the opposite direction: instead of $200M reason to support governance, $200M reason to weaken it. The commercial migration path exists — but it runs toward military applications that require governance exemptions, not toward civilian applications that require governance compliance.

What I expected but didn't find: Any indication that Anthropic's interpretability-as-product or RSP safety certification could generate commercial revenue comparable to Pentagon contracts. The safety-as-commercial-product thesis hasn't produced revenue at this scale.

KB connections: voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives — this is direct confirmation at the corporate governance level. three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture — the corporate safety track has now been weakened by the same strategic interest that creates the legislative ceiling at the international level. binding-international-governance-requires-commercial-migration-path-at-signing-not-low-competitive-stakes-at-inception — confirmation that the commercial migration path runs in the opposite direction for military AI.

Extraction hints: Key claim: "The commercial migration path for AI governance runs in reverse — military AI creates economic incentives to weaken safety constraints rather than adopt them, as evidenced by Anthropic's RSP 3.0 (February 2026) dropping its pause commitment under a $200M Pentagon contract threat." This is also relevant to the legislative ceiling arc: if the most governance-aligned corporate actor weakens its own commitments under military pressure, the three-track voluntary safety system is structurally compromised.

Context: This is the same Anthropic that submitted the AI Safety Commitments letter to the Seoul AI Safety Summit (May 2024) and signed the Bletchley Park Declaration (November 2023). The trajectory from hard commitments to non-binding roadmaps reflects 2+ years of increasing military procurement pressure.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives WHY ARCHIVED: This is the strongest evidence yet that commercial migration paths for AI governance run backward — military revenue exceeds safety-compliance revenue, removing hard governance constraints EXTRACTION HINT: Focus on the mechanism (Pentagon $200M vs. pause commitment) and its relationship to the commercial migration path framework — this is the DuPont pivot in reverse, not a general "voluntary governance is weak" observation

5.5 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

5.5 KiB

Raw Blame History