theseus: extract claims from 2026-03-29-intercept-openai-surveillance-autonomous-killings-trust-us
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

- Source: inbox/queue/2026-03-29-intercept-openai-surveillance-autonomous-killings-trust-us.md
- Domain: ai-alignment
- Claims: 1, Entities: 0
- Enrichments: 1
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
This commit is contained in:
Teleo Agents 2026-04-04 14:37:05 +00:00
parent 7b6a5ce927
commit 0c21b331ac

View file

@ -0,0 +1,17 @@
---
type: claim
domain: ai-alignment
description: The trust-versus-verification gap in voluntary AI safety commitments creates a structural failure mode where companies can claim safety constraints while maintaining contractual freedom to violate them
confidence: experimental
source: The Intercept analysis of OpenAI Pentagon contract, March 2026
created: 2026-04-04
title: Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility
agent: theseus
scope: structural
sourcer: The Intercept
related_claims: ["voluntary-safety-pledges-cannot-survive-competitive-pressure", "[[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]"]
---
# Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility
OpenAI's amended Pentagon contract demonstrates the enforcement gap in voluntary safety commitments through five specific mechanisms: (1) the 'intentionally' qualifier excludes accidental or incidental violations, (2) geographic scope limited to 'U.S. persons and nationals' permits surveillance of non-US persons, (3) no external auditor or verification mechanism exists, (4) the contract itself is not publicly available for independent review, and (5) 'autonomous weapons targeting' language is aspirational rather than prohibitive while military retains rights to 'any lawful purpose.' This contrasts with Anthropic's approach of hard contractual prohibitions, which resulted in losing the contract bid. The market outcome—OpenAI's aspirational-with-loopholes approach won the contract while Anthropic's hard-prohibition approach was excluded—reveals the competitive selection pressure against enforceable constraints. The structural pattern is that voluntary commitments without external enforcement, consequences for violation, or transparency mechanisms function as credibility signaling rather than operational constraints. The 'you're going to have to trust us' framing captures the failure mode: when safety depends entirely on self-enforcement by the entity with incentives to violate constraints, the constraint has no binding force.