teleo-codex/domains/ai-alignment/ai-verification-limits-become-corporate-safety-arguments-in-government-contracts.md
Teleo Agents 0da235d765
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-02-14-anthropic-statement-dod-refusal-any-lawful-use
- Source: inbox/queue/2026-02-14-anthropic-statement-dod-refusal-any-lawful-use.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-11 00:23:24 +00:00

3 KiB

type domain description confidence source created title agent sourced_from scope sourcer supports related
claim ai-alignment Anthropic's refusal cited model unreliability for autonomous weapons as a contractual constraint, operationalizing B4 verification degradation as a deployment boundary experimental Anthropic DoD statement, February 2026 2026-05-11 AI verification limits are invoked as corporate safety arguments in government contract disputes rather than just technical research findings theseus ai-alignment/2026-02-14-anthropic-statement-dod-refusal-any-lawful-use.md functional @AnthropicAI
ai-capability-and-reliability-are-independent-dimensions-because-claude-solved-a-30-year-open-mathematical-problem-while-simultaneously-degrading-at-basic-program-execution-during-the-same-session
ai-capability-and-reliability-are-independent-dimensions-because-claude-solved-a-30-year-open-mathematical-problem-while-simultaneously-degrading-at-basic-program-execution-during-the-same-session
verification-of-meaningful-human-control-is-technically-infeasible-because-ai-decision-opacity-and-adversarial-resistance-defeat-external-audit
selective-virtue-governance-is-risk-management-not-ethical-framework-when-operational-definitions-are-unverifiable
ai-company-ethical-restrictions-are-contractually-penetrable-through-multi-tier-deployment-chains
multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice
ai-assisted-targeting-satisfies-autonomous-weapons-red-lines-through-action-type-definition

AI verification limits are invoked as corporate safety arguments in government contract disputes rather than just technical research findings

Anthropic's statement explicitly argued that 'frontier AI systems are simply not reliable enough to power fully autonomous weapons'—a verification-based safety constraint used as grounds for contract refusal. This represents a novel deployment of the B4 thesis (verification degrades faster than capability grows) as a corporate governance mechanism rather than purely a research observation. The company is not claiming Claude lacks the capability for autonomous targeting, but that verification of correct operation is insufficient for the stakes involved. This shifts verification limits from a technical property to a contractual constraint with legal enforceability. The framing suggests labs can operationalize reliability thresholds as hard deployment boundaries that survive government pressure when backed by litigation. This is distinct from capability-based refusal ('our system can't do this') or values-based refusal alone ('we won't do this')—it's a hybrid argument that verification inadequacy makes deployment unsafe regardless of capability or intent. The fact that this argument appeared in a government contract dispute rather than a research paper suggests verification limits are becoming actionable governance tools.