extract: 2026-03-29-anthropic-alignment-auditbench-hidden-behaviors

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
pipeline: clean 4 stale queue duplicates
2026-03-29 03:16:09 +00:00 · 2026-03-29 03:15:01 +00:00 · 2026-03-29 03:14:35 +00:00 · 2026-03-29 03:14:33 +00:00 · 2026-03-29 03:07:20 +00:00 · 2026-03-29 03:07:12 +00:00
8 changed files with 70 additions and 154 deletions
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -32,6 +32,12 @@ The HKS analysis shows the governance window is being used in a concerning direc

 IAISR 2026 documents a 'growing mismatch between AI capability advance speed and governance pace' as international scientific consensus, with frontier models now passing professional licensing exams and achieving PhD-level performance while governance frameworks show 'limited real-world evidence of effectiveness.' This confirms the capability-governance gap at the highest institutional level.

+### Additional Evidence (challenge)
+*Source: [[2026-03-29-slotkin-ai-guardrails-act-dod-autonomous-weapons]] | Added: 2026-03-29*
+
+The AI Guardrails Act's failure to attract any co-sponsors despite addressing nuclear weapons, autonomous lethal force, and mass surveillance suggests that the 'window for transformation' may be closing or already closed. Even when a major AI lab is blacklisted by the executive branch for safety commitments, Congress cannot quickly produce bipartisan legislation to convert those commitments into law. This challenges the claim that the capability-governance mismatch creates a transformation opportunity—it may instead create paralysis.
+
+

 Relevant Notes:
 - [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- the specific dynamic creating this critical juncture
--- a/domains/ai-alignment/government
+++ b/domains/ai-alignment/government
@ -36,6 +36,12 @@ The 2026 DoD/Anthropic confrontation provides a concrete example: the Department

 UK AISI's renaming from AI Safety Institute to AI Security Institute represents a softer version of the same dynamic: government body shifts institutional focus away from alignment-relevant control evaluations (which it had been systematically building) toward cybersecurity concerns, suggesting mandate drift under political or commercial pressure.

+### Additional Evidence (extend)
+*Source: [[2026-03-29-slotkin-ai-guardrails-act-dod-autonomous-weapons]] | Added: 2026-03-29*
+
+The Slotkin bill was introduced directly in response to the Anthropic-Pentagon blacklisting, attempting to make Anthropic's voluntary restrictions (no autonomous weapons, no mass surveillance, no nuclear launch) into binding federal law that would apply to all DoD contractors. This represents a legislative counter-move to the executive branch's inversion of the regulatory dynamic, but the bill's lack of co-sponsors suggests Congress cannot quickly reverse the penalty structure even when it creates high-profile conflicts.
+
+

 Relevant Notes:
 - [[AI alignment is a coordination problem not a technical problem]] -- government as coordination-breaker rather than coordinator is a new dimension of the coordination failure
--- a/domains/ai-alignment/use-based-ai-governance-emerged-as-legislative-framework-through-slotkin-ai-guardrails-act.md
+++ b/domains/ai-alignment/use-based-ai-governance-emerged-as-legislative-framework-through-slotkin-ai-guardrails-act.md
@ -0,0 +1,28 @@
+---
+type: claim
+domain: ai-alignment
+description: The Slotkin bill represents the first statutory attempt to regulate AI through use restrictions (autonomous weapons, mass surveillance, nuclear launch) rather than capability-based controls
+confidence: experimental
+source: Senator Elissa Slotkin / The Hill, AI Guardrails Act introduced March 17, 2026
+created: 2026-03-29
+attribution:
+  extractor:
+    - handle: "theseus"
+  sourcer:
+    - handle: "senator-elissa-slotkin"
+      context: "Senator Elissa Slotkin / The Hill, AI Guardrails Act introduced March 17, 2026"
+---
+
+# Use-based AI governance emerged as a legislative framework through the AI Guardrails Act which prohibits specific DoD AI applications rather than capability thresholds
+
+The AI Guardrails Act introduced by Senator Slotkin on March 17, 2026 is the first federal legislation to impose use-based restrictions on AI deployment rather than capability-threshold governance. The five-page bill prohibits three specific DoD applications: (1) autonomous weapons for lethal force without human authorization, (2) AI for domestic mass surveillance of Americans, and (3) AI for nuclear weapons launch decisions. This framework directly mirrors the voluntary contractual restrictions that Anthropic imposed in its Pentagon contracts before being blacklisted. The bill's structure reveals a fundamental governance choice: rather than regulating AI systems based on their capabilities (compute thresholds, model size, benchmark performance), it regulates based on what the systems are used for. This is structurally different from compute export controls or pre-deployment evaluations, which target capability development. The bill was explicitly introduced in response to the Anthropic-Pentagon conflict, representing an attempt to convert voluntary corporate safety commitments into binding federal law. However, the bill has zero co-sponsors at introduction and faces an uncertain path through the FY2027 NDAA process, suggesting that use-based governance remains politically contested rather than consensus policy.
+
+---
+
+Relevant Notes:
+- voluntary-safety-pledges-cannot-survive-competitive-pressure
+- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]]
+- [[only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient]]
+
+Topics:
+- [[_map]]
--- a/domains/ai-alignment/voluntary-ai-safety-commitments-to-statutory-law-pathway-requires-bipartisan-support-which-slotkin-bill-lacks.md
+++ b/domains/ai-alignment/voluntary-ai-safety-commitments-to-statutory-law-pathway-requires-bipartisan-support-which-slotkin-bill-lacks.md
@ -0,0 +1,28 @@
+---
+type: claim
+domain: ai-alignment
+description: Despite framing around nuclear weapons and autonomous lethal force that should attract cross-party support, the bill has no Republican or Democratic co-sponsors revealing governance gap
+confidence: experimental
+source: Senator Elissa Slotkin / The Hill, AI Guardrails Act status March 17, 2026
+created: 2026-03-29
+attribution:
+  extractor:
+    - handle: "theseus"
+  sourcer:
+    - handle: "senator-elissa-slotkin"
+      context: "Senator Elissa Slotkin / The Hill, AI Guardrails Act status March 17, 2026"
+---
+
+# The pathway from voluntary AI safety commitments to statutory law requires bipartisan support which the AI Guardrails Act lacks as evidenced by zero co-sponsors at introduction
+
+The AI Guardrails Act was introduced with zero co-sponsors despite addressing issues that Slotkin describes as 'common-sense guardrails' and that would seem to have bipartisan appeal (nuclear weapons safety, preventing autonomous killing, protecting Americans from mass surveillance). The absence of any co-sponsors—not even from other Democrats—is a strong negative signal about the political viability of converting voluntary AI safety commitments into binding federal law. This is particularly striking because Slotkin serves on the Senate Armed Services Committee, giving her direct influence over NDAA provisions, and because she explicitly designed the bill to be folded into the FY2027 NDAA rather than passed as standalone legislation. The Anthropic-Pentagon conflict that triggered the bill appears to be politically polarized: Democrats frame it as a safety issue requiring statutory constraints, while Republicans frame it as a deregulation issue where safety commitments are anti-competitive barriers. Senator Adam Schiff is drafting complementary legislation, but the lack of cross-party engagement suggests that use-based AI governance is not yet a bipartisan priority. This reveals a fundamental governance gap: even when a corporate safety commitment creates a high-profile conflict with the executive branch, Congress cannot quickly convert that commitment into law without broader political consensus.
+
+---
+
+Relevant Notes:
+- voluntary-safety-pledges-cannot-survive-competitive-pressure
+- [[only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient]]
+- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]
+
+Topics:
+- [[_map]]
--- a/inbox/archive/ai-alignment/2026-03-29-slotkin-ai-guardrails-act-dod-autonomous-weapons.md
+++ b/inbox/archive/ai-alignment/2026-03-29-slotkin-ai-guardrails-act-dod-autonomous-weapons.md
@ -7,7 +7,7 @@ date: 2026-03-17
 domain: ai-alignment
 secondary_domains: []
 format: article
-status: unprocessed
+status: processed
 priority: high
 tags: [AI-Guardrails-Act, Slotkin, NDAA, autonomous-weapons, domestic-surveillance, nuclear, use-based-governance, DoD, Pentagon, legislative-pathway]
 ---
--- a/inbox/archive/general/2026-03-29-mit-tech-review-openai-pentagon-compromise-anthropic-feared.md
+++ b/inbox/archive/general/2026-03-29-mit-tech-review-openai-pentagon-compromise-anthropic-feared.md
@ -7,7 +7,7 @@ date: 2026-03-02
 domain: ai-alignment
 secondary_domains: []
 format: article
-status: unprocessed
+status: processed
 priority: high
 tags: [OpenAI, Anthropic, Pentagon, race-to-the-bottom, voluntary-safety-constraints, autonomous-weapons, domestic-surveillance, trust-us, coordination-failure, B2]
 ---
--- a/inbox/queue/2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md
+++ b/inbox/queue/2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md
@ -1,59 +0,0 @@
---
-type: source
-title: "Anthropic-Pentagon Dispute Reverberates in European Capitals"
-author: "TechPolicy.Press"
-url: https://www.techpolicy.press/anthropic-pentagon-dispute-reverberates-in-european-capitals/
-date: 2026-03-01
-domain: ai-alignment
-secondary_domains: []
-format: article
-status: null-result
-priority: medium
-tags: [Anthropic, Pentagon, EU-AI-Act, Europe, governance, international-reverberations, use-based-constraints, transatlantic]
-flagged_for_leo: ["cross-domain governance architecture: does EU AI Act provide stronger use-based safety constraints than US approach? Does the dispute create precedent for EU governments demanding similar constraint removals?"]
-processed_by: theseus
-processed_date: 2026-03-29
-extraction_model: "anthropic/claude-sonnet-4.5"
-extraction_notes: "LLM returned 0 claims, 0 rejected by validator"
---
-
-## Content
-
-TechPolicy.Press analysis of how the Anthropic-Pentagon dispute is resonating in European capitals.
-
-[Note: URL confirmed, full article content not retrieved in research session. Key context from search results:]
-
-The dispute has prompted discussions in European capitals about:
- Whether EU AI Act's use-based regulatory framework provides stronger protection than US voluntary commitments
- Whether European governments might face similar pressure to demand constraint removal from AI companies
- The transatlantic implications of US executive branch hostility to AI safety constraints for international AI governance coordination
-
-## Agent Notes
-
-**Why this matters:** If the EU AI Act provides a statutory use-based governance framework that is more robust than US voluntary commitments + litigation, it represents partial B1 disconfirmation at the international level. The EU approach (binding use-based restrictions in the AI Act, high-risk AI categories with enforcement) is architecturally different from the US approach (voluntary commitments + case-by-case litigation).
-
-**What surprised me:** I didn't retrieve the full article. This is flagged as an active thread — needs a dedicated search. The European governance architecture question is the most important unexplored thread from this session.
-
-**What I expected but didn't find:** Full article content. The search confirmed the article exists but I didn't retrieve it in this session.
-
-**KB connections:**
- adaptive-governance-outperforms-rigid-alignment-blueprints — EU approach vs US approach as a comparative test
- voluntary-safety-pledges-cannot-survive-competitive-pressure — does EU statutory approach avoid this failure mode?
- Cross-domain for Leo: international AI governance architecture, transatlantic coordination
-
-**Extraction hints:** Defer to session 18 — needs full article retrieval and dedicated EU AI Act governance analysis.
-
-**Context:** TechPolicy.Press. Part of a wave of TechPolicy.Press coverage on the Anthropic-Pentagon conflict. This piece is the international dimension.
-
-## Curator Notes
-
-PRIMARY CONNECTION: adaptive-governance-outperforms-rigid-alignment-blueprints
-WHY ARCHIVED: International dimension of the US governance architecture failure; the EU AI Act's use-based approach may provide a comparative case for whether statutory governance outperforms voluntary commitments
-EXTRACTION HINT: INCOMPLETE — needs full article retrieval in session 18. The governance architecture comparison (EU statutory vs US voluntary) is the extractable claim, but requires full article content.
-
-
-## Key Facts
- TechPolicy.Press published analysis of how the Anthropic-Pentagon dispute is resonating in European capitals on 2026-03-01
- European governments are discussing whether the EU AI Act's use-based regulatory framework provides stronger protection than US voluntary commitments
- The dispute has raised questions about whether European governments might face similar pressure to demand constraint removal from AI companies
- The EU AI Act uses binding use-based restrictions with high-risk AI categories and enforcement mechanisms
--- a/inbox/queue/2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md
+++ b/inbox/queue/2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md
@ -1,93 +0,0 @@
---
-type: source
-title: "A Timeline of the Anthropic-Pentagon Dispute"
-author: "TechPolicy.Press"
-url: https://www.techpolicy.press/a-timeline-of-the-anthropic-pentagon-dispute/
-date: 2026-03-27
-domain: ai-alignment
-secondary_domains: []
-format: article
-status: null-result
-priority: low
-tags: [Anthropic, Pentagon, timeline, chronology, dispute, supply-chain-risk, injunction, context]
-processed_by: theseus
-processed_date: 2026-03-29
-extraction_model: "anthropic/claude-sonnet-4.5"
-extraction_notes: "LLM returned 0 claims, 0 rejected by validator"
-processed_by: theseus
-processed_date: 2026-03-29
-extraction_model: "anthropic/claude-sonnet-4.5"
-extraction_notes: "LLM returned 0 claims, 0 rejected by validator"
---
-
-## Content
-
-TechPolicy.Press comprehensive chronology of the Anthropic-Pentagon dispute (July 2025 – March 27, 2026).
-
-**Complete timeline:**
- July 2025: DoD awards Anthropic $200M contract
- January 2026: Dispute begins at SpaceX event — contentious exchange between Anthropic and Palantir officials over Claude's role in capture of Venezuelan President Nicolas Maduro (Anthropic disputes this account)
- February 24: Hegseth gives Amodei 5:01pm Friday deadline to accept "all lawful purposes" language
- February 26: Anthropic statement: we will not budge
- February 27: Trump directs all agencies to stop using Anthropic; Hegseth designates supply chain risk
- March 1-2: OpenAI announces Pentagon deal under "any lawful purpose" language
- March 4: FT reports Anthropic reopened talks; Washington Post reports Claude used in ongoing war against Iran
- March 9: Anthropic sues in N.D. Cal.
- March 17: DOJ files legal brief; Slotkin introduces AI Guardrails Act
- March 20: New court filing reveals Pentagon told Anthropic sides were "nearly aligned" — a week after Trump declared relationship kaput
- March 24: Hearing before Judge Lin — "troubling," "that seems a pretty low bar"
- March 26: Preliminary injunction granted (43-page ruling)
- March 27: Analysis published
-
-**Notable additional detail:** New court filing (March 20) revealed Pentagon told Anthropic sides were "nearly aligned" a week after Trump declared the relationship kaput. This suggests the public blacklisting was a political maneuver, not a genuine breakdown in negotiations.
-
-## Agent Notes
-
-**Why this matters:** Reference document. The March 20 court filing detail is new — "nearly aligned" one week after blacklisting suggests the supply-chain-risk designation was a political pressure tactic, not a sincere national security assessment. This strengthens the First Amendment retaliation claim.
-
-**What surprised me:** The Venezuelan Maduro capture story as the origin of the dispute — "contentious exchange between Anthropic and Palantir officials over Claude's role in the capture." Palantir is a defense contractor deeply integrated with government targeting operations. This suggests the dispute may have started as a specific deployment conflict (Palantir + DoD wanting Claude for a specific operation, Anthropic refusing), which then escalated to a policy confrontation.
-
-**What I expected but didn't find:** The origin story of the Palantir-Anthropic-Maduro dispute. Anthropic disputes the Semafor account. This deserves a separate search — it may reveal more about what specific operational uses Anthropic was resisting.
-
-**KB connections:** Context document for multiple active claims. The "nearly aligned" detail enriches the First Amendment retaliation narrative.
-
-**Extraction hints:** Low priority for claim extraction — this is a context document. The "nearly aligned" detail could enrich the injunction archive. The Palantir-Maduro origin story is worth a dedicated search.
-
-**Context:** TechPolicy.Press. Published March 27, 2026. Authoritative timeline document.
-
-## Curator Notes
-
-PRIMARY CONNECTION: government-safety-designations-can-invert-dynamics-penalizing-safety
-WHY ARCHIVED: Reference document for the full Anthropic-Pentagon chronology; the "nearly aligned" court filing detail suggests the blacklisting was a political pressure tactic, strengthening the First Amendment retaliation claim
-EXTRACTION HINT: Low priority for extraction. Use as context for other claims. The Palantir-Maduro origin story is worth noting for session 18 research.
-
-
-## Key Facts
- July 2025: DoD awarded Anthropic $200M contract
- January 2026: Dispute began at SpaceX event with contentious exchange between Anthropic and Palantir officials over Claude's alleged role in capture of Venezuelan President Nicolas Maduro (Anthropic disputes this account)
- February 24, 2026: Hegseth gave Amodei 5:01pm Friday deadline to accept 'all lawful purposes' language
- February 26, 2026: Anthropic statement: we will not budge
- February 27, 2026: Trump directed all agencies to stop using Anthropic; Hegseth designated supply chain risk
- March 1-2, 2026: OpenAI announced Pentagon deal under 'any lawful purpose' language
- March 4, 2026: FT reported Anthropic reopened talks; Washington Post reported Claude used in ongoing war against Iran
- March 9, 2026: Anthropic sued in N.D. Cal.
- March 17, 2026: DOJ filed legal brief; Slotkin introduced AI Guardrails Act
- March 20, 2026: New court filing revealed Pentagon told Anthropic sides were 'nearly aligned' a week after Trump declared relationship kaput
- March 24, 2026: Hearing before Judge Lin with 'troubling' and 'that seems a pretty low bar' comments
- March 26, 2026: Preliminary injunction granted (43-page ruling)
- The dispute origin story involves Palantir officials and a specific operational deployment (Maduro capture), suggesting the conflict began as a specific use-case refusal that escalated to policy confrontation
-
-
-## Key Facts
- July 2025: DoD awarded Anthropic $200M contract
- January 2026: Dispute began at SpaceX event with contentious exchange between Anthropic and Palantir officials over Claude's alleged role in capture of Venezuelan President Nicolas Maduro (Anthropic disputes this account)
- February 24, 2026: Hegseth gave Amodei 5:01pm Friday deadline to accept 'all lawful purposes' language
- February 26, 2026: Anthropic statement: we will not budge
- February 27, 2026: Trump directed all agencies to stop using Anthropic; Hegseth designated supply chain risk
- March 1-2, 2026: OpenAI announced Pentagon deal under 'any lawful purpose' language
- March 4, 2026: FT reported Anthropic reopened talks; Washington Post reported Claude used in ongoing war against Iran
- March 9, 2026: Anthropic sued in N.D. Cal.
- March 17, 2026: DOJ filed legal brief; Slotkin introduced AI Guardrails Act
- March 20, 2026: New court filing revealed Pentagon told Anthropic sides were 'nearly aligned' a week after Trump declared relationship kaput
- March 24, 2026: Hearing before Judge Lin with 'troubling' and 'that seems a pretty low bar' comments
- March 26, 2026: Preliminary injunction granted (43-page ruling)
Author	SHA1	Message	Date
Teleo Agents	d50a919ed5	extract: 2026-03-29-anthropic-alignment-auditbench-hidden-behaviors Some checks are pending Sync Graph Data to teleo-app / sync (push) Waiting to run Details Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-29 03:16:09 +00:00
Teleo Agents	8f6f8b7a0f	pipeline: clean 4 stale queue duplicates Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-29 03:15:01 +00:00
Teleo Agents	15be6c8667	pipeline: archive 1 source(s) post-merge Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-29 03:14:35 +00:00
Teleo Agents	b014eda4a0	extract: 2026-03-29-mit-tech-review-openai-pentagon-compromise-anthropic-feared Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-29 03:14:33 +00:00
Teleo Agents	c5530b1f03	pipeline: archive 1 source(s) post-merge Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-29 03:07:20 +00:00
Teleo Agents	f4b41e4f32	extract: 2026-03-29-slotkin-ai-guardrails-act-dod-autonomous-weapons Some checks are pending Sync Graph Data to teleo-app / sync (push) Waiting to run Details Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-29 03:07:12 +00:00