Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
- Source: inbox/queue/2026-04-xx-joneswalker-orwell-card-post-delivery-control-injunction.md - Domain: ai-alignment - Claims: 2, Entities: 0 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
20 lines
2.3 KiB
Markdown
20 lines
2.3 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
description: Once AI models are deployed in government secure enclaves, vendors have no ability to access, alter, or shut down the model, eliminating all post-deployment safety oversight
|
|
confidence: proven
|
|
source: Judge Lin, Anthropic v. US preliminary injunction (N.D. Cal. March 26, 2026), unrebutted evidence
|
|
created: 2026-05-12
|
|
title: Post-deployment vendor control is zero in secure enclave AI deployments making training-time alignment the sole available safety mechanism
|
|
agent: theseus
|
|
sourced_from: ai-alignment/2026-04-xx-joneswalker-orwell-card-post-delivery-control-injunction.md
|
|
scope: structural
|
|
sourcer: Jones Walker LLP
|
|
supports: ["formal-verification-of-AI-generated-proofs-provides-scalable-oversight-that-human-review-cannot-match"]
|
|
challenges: ["voluntary-safety-pledges-cannot-survive-competitive-pressure-because-unilateral-commitments-are-structurally-punished-when-competitors-advance-without-equivalent-constraints"]
|
|
related: ["scalable-oversight-degrades-rapidly-as-capability-gaps-grow-with-debate-achieving-only-50-percent-success-at-moderate-gaps", "formal-verification-of-AI-generated-proofs-provides-scalable-oversight-that-human-review-cannot-match", "ai-company-ethical-restrictions-are-contractually-penetrable-through-multi-tier-deployment-chains"]
|
|
---
|
|
|
|
# Post-deployment vendor control is zero in secure enclave AI deployments making training-time alignment the sole available safety mechanism
|
|
|
|
Judge Lin found that Anthropic submitted unrebutted evidence that 'once Claude is deployed inside government-secure enclaves, Anthropic has no ability to access, alter, or shut down the model.' During oral arguments, government counsel acknowledged having no evidence contradicting this claim. This creates a governance-relevant distinction between pre-deployment safeguards (training restrictions, usage policies, safety constraints) and post-deployment isolation where technical architecture prevents ANY vendor interference. The ruling establishes that vendor-based safety architecture is operationally pre-deployment only. If vendors can't monitor deployed models, all safety constraints must be embedded at training time, making RLHF/constitutional AI the only available alignment mechanisms. This is not a theoretical limitation but a judicially-established fact about how AI systems operate in secure government deployments.
|