From 41b4ea2fd155a243a087a4ca48c27b931c21abd6 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Wed, 22 Apr 2026 09:10:15 +0000 Subject: [PATCH] theseus: extract claims from 2026-04-22-aisi-uk-mythos-cyber-evaluation - Source: inbox/queue/2026-04-22-aisi-uk-mythos-cyber-evaluation.md - Domain: ai-alignment - Claims: 2, Entities: 0 - Enrichments: 4 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus --- ...is-for-mandatory-third-party-evaluation.md | 14 ++++++++----- ...vidence-exceeding-benchmark-predictions.md | 8 +++++++- ...pability-uplift-to-operational-autonomy.md | 20 +++++++++++++++++++ ...al-negotiation-is-governance-instrument.md | 18 +++++++++++++++++ ...mands-safety-unconstrained-alternatives.md | 9 ++++++++- 5 files changed, 62 insertions(+), 7 deletions(-) create mode 100644 domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md create mode 100644 domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md diff --git a/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md b/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md index f33782e5c..eba8d54fa 100644 --- a/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md +++ b/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md @@ -11,10 +11,8 @@ attribution: sourcer: - handle: "openai-and-anthropic-(joint)" context: "OpenAI and Anthropic joint evaluation, August 2025" -related: -- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response -reweave_edges: -- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17 +related: ["Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations", "multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments"] +reweave_edges: ["Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17"] --- # Cross-lab alignment evaluation surfaces safety gaps that internal evaluation misses, providing an empirical basis for mandatory third-party AI safety evaluation as a governance mechanism @@ -28,4 +26,10 @@ Relevant Notes: - voluntary-safety-pledges-cannot-survive-competitive-pressure-because-unilateral-commitments-are-structurally-punished-when-competitors-advance-without-equivalent-constraints.md Topics: -- [[_map]] \ No newline at end of file +- [[_map]] + +## Supporting Evidence + +**Source:** UK AISI independent evaluation of Anthropic Mythos, April 2026 + +UK AISI as independent government evaluator published findings about Mythos cyber capabilities that have direct implications for Anthropic's commercial negotiations and safety classification decisions. The evaluation revealed Mythos as first model to complete 32-step enterprise attack chain, a finding with governance significance that independent evaluation surfaced publicly. diff --git a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md index 9129fca8a..f69fd0c3f 100644 --- a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md +++ b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md @@ -22,4 +22,10 @@ The paper documents that cyber capabilities have crossed a threshold that other This distinguishes cyber from biological weapons and self-replication risks, where the benchmark-reality gap predominantly runs in one direction (benchmarks overstate capability) and real-world demonstrations remain theoretical or unpublished. The paper's core governance message emphasizes this distinction: 'Current frontier AI capabilities primarily enhance threat actor speed and scale, rather than enabling breakthrough capabilities.' -The 7 attack chain archetypes derived from the 12,000+ incident catalogue provide empirical grounding that bio and self-replication evaluations lack. While CTF benchmarks may overstate exploitation capability (6.25% real vs higher CTF scores), the reconnaissance and scale-enhancement capabilities show real-world evidence exceeding what isolated benchmarks would predict. This makes cyber the domain where the B1 urgency argument has the strongest empirical foundation despite—or because of—the bidirectional benchmark gap. \ No newline at end of file +The 7 attack chain archetypes derived from the 12,000+ incident catalogue provide empirical grounding that bio and self-replication evaluations lack. While CTF benchmarks may overstate exploitation capability (6.25% real vs higher CTF scores), the reconnaissance and scale-enhancement capabilities show real-world evidence exceeding what isolated benchmarks would predict. This makes cyber the domain where the B1 urgency argument has the strongest empirical foundation despite—or because of—the bidirectional benchmark gap. + +## Supporting Evidence + +**Source:** UK AISI Mythos evaluation, April 2026 + +Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges and completed 3/10 attempts at a 32-step enterprise attack chain that no previous model had completed. AISI specifically noted Mythos is 'highly effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in deployed models exceed what component-task benchmarks predict. diff --git a/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md b/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md new file mode 100644 index 000000000..5f700c374 --- /dev/null +++ b/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md @@ -0,0 +1,20 @@ +--- +type: claim +domain: ai-alignment +description: Claude Mythos Preview's completion of a 32-step enterprise network intrusion from start to finish represents a threshold crossing from tool-assisted attacks to autonomous attack capability +confidence: experimental +source: UK AI Security Institute, Claude Mythos Preview evaluation April 2026 +created: 2026-04-22 +title: The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change +agent: theseus +sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md +scope: causal +sourcer: UK AI Security Institute +supports: ["three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture", "voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives"] +challenges: ["cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics"] +related: ["cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements"] +--- + +# The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change + +UK AISI evaluation found Claude Mythos Preview completed the 32-step 'The Last Ones' enterprise-network attack range from start to finish in 3 of 10 attempts, making it the first AI model across all AISI tests to achieve this. This is qualitatively different from previous models that showed capability uplift on isolated cyber tasks. The 73% success rate on expert-level CTF challenges demonstrates component capability, but the end-to-end attack chain completion demonstrates operational autonomy — the ability to string reconnaissance, exploitation, lateral movement, and persistence into a coherent intrusion without human intervention at each step. AISI specifically noted Mythos is 'comparable to GPT-5.4 on individual cyber tasks but stronger at attack chaining.' This threshold crossing matters for governance because it converts incremental risk (better tools for human attackers) into categorical risk (systems that ARE attackers). The evaluation was conducted by an independent government body with access to classified attack ranges, making this higher-confidence evidence than vendor self-evaluation. diff --git a/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md b/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md new file mode 100644 index 000000000..359aa6cbe --- /dev/null +++ b/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md @@ -0,0 +1,18 @@ +--- +type: claim +domain: ai-alignment +description: UK AISI publishing Mythos cyber capability evaluation while Anthropic negotiates Pentagon deal demonstrates how third-party evaluation creates transparency that private negotiations structurally cannot +confidence: experimental +source: UK AISI Mythos evaluation timing, April 2026 +created: 2026-04-22 +title: Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction +agent: theseus +sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md +scope: functional +sourcer: UK AI Security Institute +related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation"] +--- + +# Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction + +UK AISI published detailed evaluation of Claude Mythos Preview's cyber capabilities in April 2026 while Anthropic was actively negotiating a Pentagon deal. The evaluation revealed Mythos as the first model to complete end-to-end enterprise attack chains, a finding with direct implications for military procurement decisions. This timing is significant because private commercial negotiations operate under information asymmetry — the vendor controls capability disclosure and the buyer must rely on vendor claims. Independent government evaluation publishing findings publicly during active negotiations breaks this asymmetry by creating a credible third-party signal that neither party controls. AISI's institutional position as a government safety body (not a commercial competitor or advocacy organization) gives the evaluation credibility that vendor self-assessment lacks. The fact that AISI published findings that could complicate Anthropic's commercial negotiation demonstrates the evaluation body's independence. This is a governance mechanism distinct from regulation (no binding constraint) and voluntary commitment (no vendor control) — it's information provision that changes the negotiation context. diff --git a/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md b/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md index 45ae00f93..6fe8f15f7 100644 --- a/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md +++ b/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md @@ -12,7 +12,7 @@ sourcer: Leo related_claims: ["[[technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation]]"] supports: ["Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility"] reweave_edges: ["Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility|supports|2026-04-07"] -related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "judicial-oversight-of-ai-governance-through-constitutional-grounds-not-statutory-safety-law", "voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance", "voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "judicial-oversight-checks-executive-ai-retaliation-but-cannot-create-positive-safety-obligations"] +related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "judicial-oversight-of-ai-governance-through-constitutional-grounds-not-statutory-safety-law", "voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance", "voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "judicial-oversight-checks-executive-ai-retaliation-but-cannot-create-positive-safety-obligations", "judicial-framing-of-voluntary-ai-safety-constraints-as-financial-harm-removes-constitutional-floor-enabling-administrative-dismantling", "split-jurisdiction-injunction-pattern-maps-boundary-of-judicial-protection-for-voluntary-ai-safety-policies-civil-protected-military-not"] --- # Voluntary AI safety constraints are protected as corporate speech but unenforceable as safety requirements, creating legal mechanism gap when primary demand-side actor seeks safety-unconstrained providers @@ -38,3 +38,10 @@ The DURC/PEPP case extends beyond voluntary constraints lacking enforcement—it **Source:** Stanford CodeX analysis, March 7, 2026 Nippon Life v. OpenAI (filed March 4, 2026) tests whether product liability doctrine can create mandatory enforcement through design defect theory. OpenAI's October 2024 ToS disclaimer warning against litigation use is characterized as a 'behavioral patch' that failed to prevent foreseeable harm. If the court accepts that architectural safeguards (surfacing epistemic limitations at point of output) are legally distinct from contractual disclaimers, it creates tort-based enforcement without requiring new legislation or voluntary compliance. + + +## Supporting Evidence + +**Source:** UK AISI Mythos evaluation during Pentagon negotiations, April 2026 + +Mythos evaluation occurred while Anthropic negotiates Pentagon deal, creating direct tension between safety evaluation findings (first model to complete end-to-end attack chains) and customer capability demands (military procurement). The timing demonstrates how voluntary safety frameworks face pressure when primary customer specifically wants the capability that safety evaluation flags as concerning.