From d7d38bd34d3de3c3ecf60272ed98b3a1ecc3d596 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Wed, 22 Apr 2026 10:04:25 +0000 Subject: [PATCH] theseus: extract claims from 2026-04-22-aisi-uk-mythos-cyber-evaluation - Source: inbox/queue/2026-04-22-aisi-uk-mythos-cyber-evaluation.md - Domain: ai-alignment - Claims: 0, Entities: 0 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus --- ...eal-world-evidence-exceeding-benchmark-predictions.md | 7 +++++++ ...ng-commercial-negotiation-is-governance-instrument.md | 9 ++++++++- ...customer-demands-safety-unconstrained-alternatives.md | 7 +++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md index 0370719c9..871fcbb6a 100644 --- a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md +++ b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md @@ -34,3 +34,10 @@ Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges a **Source:** UK AISI Mythos evaluation, April 2026 Claude Mythos Preview's 3/10 success rate on completing a 32-step enterprise network intrusion from start to finish provides the first documented case of an AI model achieving end-to-end autonomous attack capability in a realistic environment. This exceeds what CTF benchmark performance (73% success on isolated tasks) would predict, confirming that cyber capabilities in integrated attack scenarios can exceed component-task predictions. AISI specifically noted Mythos's effectiveness at 'mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' + + +## Supporting Evidence + +**Source:** UK AISI evaluation, April 2026 + +Claude Mythos Preview achieved 73% success on expert-level CTF challenges and completed full 32-step enterprise attack chains, with AISI noting it is 'specifically effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in frontier models exceed isolated benchmark predictions. diff --git a/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md b/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md index 359aa6cbe..bdbbc5499 100644 --- a/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md +++ b/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md @@ -10,9 +10,16 @@ agent: theseus sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md scope: functional sourcer: UK AI Security Institute -related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation"] +related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions"] --- # Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction UK AISI published detailed evaluation of Claude Mythos Preview's cyber capabilities in April 2026 while Anthropic was actively negotiating a Pentagon deal. The evaluation revealed Mythos as the first model to complete end-to-end enterprise attack chains, a finding with direct implications for military procurement decisions. This timing is significant because private commercial negotiations operate under information asymmetry — the vendor controls capability disclosure and the buyer must rely on vendor claims. Independent government evaluation publishing findings publicly during active negotiations breaks this asymmetry by creating a credible third-party signal that neither party controls. AISI's institutional position as a government safety body (not a commercial competitor or advocacy organization) gives the evaluation credibility that vendor self-assessment lacks. The fact that AISI published findings that could complicate Anthropic's commercial negotiation demonstrates the evaluation body's independence. This is a governance mechanism distinct from regulation (no binding constraint) and voluntary commitment (no vendor control) — it's information provision that changes the negotiation context. + + +## Supporting Evidence + +**Source:** UK AISI, April 2026 + +UK AISI published detailed evaluation of Claude Mythos Preview's dangerous cyber capabilities while Anthropic was actively negotiating a Pentagon deployment deal, demonstrating independent evaluation as a governance mechanism that creates information asymmetry reduction during private commercial negotiations. diff --git a/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md b/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md index 43f45736f..14c83eacb 100644 --- a/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md +++ b/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md @@ -66,3 +66,10 @@ The CISA exclusion from Mythos access while NSA received access demonstrates tha **Source:** UK AISI Mythos evaluation, April 2026 The absence of public ASL-4 classification announcement for Claude Mythos Preview while Anthropic negotiates a Pentagon deal provides empirical evidence of the mechanism. AISI's evaluation demonstrates capability uplift sufficient to trigger ASL-4 under Anthropic's published RSP criteria (demonstrated uplift to sophisticated attacks, autonomous end-to-end intrusion capability), yet no ASL-4 announcement has been made during the commercial negotiation period. This suggests that voluntary safety level classifications are subject to strategic timing considerations when the primary customer (Pentagon) requires capability-maximizing alternatives. + + +## Supporting Evidence + +**Source:** UK AISI evaluation during Pentagon negotiations, April 2026 + +AISI published evaluation showing Mythos crosses dangerous capability thresholds while Anthropic is simultaneously negotiating a Pentagon deal, and notably there is no public ASL-4 classification announcement despite the evaluation appearing to meet RSP trigger criteria. This demonstrates voluntary safety commitments under commercial pressure from government customers.