theseus: extract claims from 2026-04-22-aisi-uk-mythos-cyber-evaluation
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
- Source: inbox/queue/2026-04-22-aisi-uk-mythos-cyber-evaluation.md - Domain: ai-alignment - Claims: 0, Entities: 0 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
This commit is contained in:
parent
3858c38b7b
commit
d7d38bd34d
3 changed files with 22 additions and 1 deletions
|
|
@ -34,3 +34,10 @@ Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges a
|
|||
**Source:** UK AISI Mythos evaluation, April 2026
|
||||
|
||||
Claude Mythos Preview's 3/10 success rate on completing a 32-step enterprise network intrusion from start to finish provides the first documented case of an AI model achieving end-to-end autonomous attack capability in a realistic environment. This exceeds what CTF benchmark performance (73% success on isolated tasks) would predict, confirming that cyber capabilities in integrated attack scenarios can exceed component-task predictions. AISI specifically noted Mythos's effectiveness at 'mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.'
|
||||
|
||||
|
||||
## Supporting Evidence
|
||||
|
||||
**Source:** UK AISI evaluation, April 2026
|
||||
|
||||
Claude Mythos Preview achieved 73% success on expert-level CTF challenges and completed full 32-step enterprise attack chains, with AISI noting it is 'specifically effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in frontier models exceed isolated benchmark predictions.
|
||||
|
|
|
|||
|
|
@ -10,9 +10,16 @@ agent: theseus
|
|||
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
||||
scope: functional
|
||||
sourcer: UK AI Security Institute
|
||||
related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation"]
|
||||
related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions"]
|
||||
---
|
||||
|
||||
# Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction
|
||||
|
||||
UK AISI published detailed evaluation of Claude Mythos Preview's cyber capabilities in April 2026 while Anthropic was actively negotiating a Pentagon deal. The evaluation revealed Mythos as the first model to complete end-to-end enterprise attack chains, a finding with direct implications for military procurement decisions. This timing is significant because private commercial negotiations operate under information asymmetry — the vendor controls capability disclosure and the buyer must rely on vendor claims. Independent government evaluation publishing findings publicly during active negotiations breaks this asymmetry by creating a credible third-party signal that neither party controls. AISI's institutional position as a government safety body (not a commercial competitor or advocacy organization) gives the evaluation credibility that vendor self-assessment lacks. The fact that AISI published findings that could complicate Anthropic's commercial negotiation demonstrates the evaluation body's independence. This is a governance mechanism distinct from regulation (no binding constraint) and voluntary commitment (no vendor control) — it's information provision that changes the negotiation context.
|
||||
|
||||
|
||||
## Supporting Evidence
|
||||
|
||||
**Source:** UK AISI, April 2026
|
||||
|
||||
UK AISI published detailed evaluation of Claude Mythos Preview's dangerous cyber capabilities while Anthropic was actively negotiating a Pentagon deployment deal, demonstrating independent evaluation as a governance mechanism that creates information asymmetry reduction during private commercial negotiations.
|
||||
|
|
|
|||
|
|
@ -66,3 +66,10 @@ The CISA exclusion from Mythos access while NSA received access demonstrates tha
|
|||
**Source:** UK AISI Mythos evaluation, April 2026
|
||||
|
||||
The absence of public ASL-4 classification announcement for Claude Mythos Preview while Anthropic negotiates a Pentagon deal provides empirical evidence of the mechanism. AISI's evaluation demonstrates capability uplift sufficient to trigger ASL-4 under Anthropic's published RSP criteria (demonstrated uplift to sophisticated attacks, autonomous end-to-end intrusion capability), yet no ASL-4 announcement has been made during the commercial negotiation period. This suggests that voluntary safety level classifications are subject to strategic timing considerations when the primary customer (Pentagon) requires capability-maximizing alternatives.
|
||||
|
||||
|
||||
## Supporting Evidence
|
||||
|
||||
**Source:** UK AISI evaluation during Pentagon negotiations, April 2026
|
||||
|
||||
AISI published evaluation showing Mythos crosses dangerous capability thresholds while Anthropic is simultaneously negotiating a Pentagon deal, and notably there is no public ASL-4 classification announcement despite the evaluation appearing to meet RSP trigger criteria. This demonstrates voluntary safety commitments under commercial pressure from government customers.
|
||||
|
|
|
|||
Loading…
Reference in a new issue