diff --git a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md index be4a75109..429ced1df 100644 --- a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md +++ b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md @@ -35,6 +35,12 @@ STREAM framework proposes standardized ChemBio evaluation reporting with 23-expe --- +### Additional Evidence (challenge) +*Source: [[2026-03-25-cyber-capability-ctf-vs-real-attack-framework]] | Added: 2026-03-25* + +Cyber may present more proximate AI-enabled catastrophic risk than bio because real-world evidence already exists at scale: 12,000+ catalogued incidents, documented state-sponsored campaigns with autonomous AI execution, and zero-day discovery systems finding all vulnerabilities in major security releases. Bio risk remains grounded primarily in benchmark performance (text-based capability demonstrations) without comparable real-world operational evidence, suggesting cyber has crossed the threshold from theoretical to operational dangerous capability. + + Relevant Notes: - [[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]] — Amodei's admission of Claude exhibiting deception and subversion during testing is a concrete instance of this pattern, with bioweapon implications - [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] — bioweapon guardrails are a specific instance of containment that AI capability may outpace diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 7d9864db5..c67fc79b7 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -119,6 +119,12 @@ Anthropic's explicit admission that 'the science of model evaluation isn't well- METR's scaffold sensitivity finding (GPT-4o and o3 performing better under Vivaria than Inspect) adds a new dimension to evaluation unreliability: the same model produces different capability estimates depending on evaluation infrastructure, introducing cross-model comparison uncertainty that governance frameworks do not account for. +### Additional Evidence (extend) +*Source: [[2026-03-25-cyber-capability-ctf-vs-real-attack-framework]] | Added: 2026-03-25* + +Cyber capability evaluations reveal a bidirectional benchmark-reality gap: CTF challenges predict only 6.25% real exploitation success (overstatement) while missing AI's documented operational advantage in reconnaissance where real-world use already exceeds benchmark predictions. This extends the evaluation-reality gap framework by showing the gap can run in opposite directions within the same domain depending on task phase. + + diff --git a/inbox/queue/.extraction-debug/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.json b/inbox/queue/.extraction-debug/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.json new file mode 100644 index 000000000..73876bd9f --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.json @@ -0,0 +1,37 @@ +{ + "rejected_claims": [ + { + "filename": "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-due-to-phase-isolation.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 2, + "kept": 0, + "fixed": 7, + "rejected": 2, + "fixes_applied": [ + "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-due-to-phase-isolation.md:set_created:2026-03-25", + "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-due-to-phase-isolation.md:stripped_wiki_link:pre-deployment-AI-evaluations-do-not-predict-real-world-risk", + "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-due-to-phase-isolation.md:stripped_wiki_link:AI lowers the expertise barrier for engineering biological w", + "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md:set_created:2026-03-25", + "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md:stripped_wiki_link:AI lowers the expertise barrier for engineering biological w", + "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md:stripped_wiki_link:pre-deployment-AI-evaluations-do-not-predict-real-world-risk", + "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md:stripped_wiki_link:current language models escalate to nuclear war in simulated" + ], + "rejections": [ + "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-due-to-phase-isolation.md:missing_attribution_extractor", + "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-25" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md b/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md index 9cebd5d40..f7095861f 100644 --- a/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md +++ b/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md @@ -7,9 +7,13 @@ date: 2025-03-01 domain: ai-alignment secondary_domains: [] format: research-paper -status: unprocessed +status: enrichment priority: medium tags: [cyber-capability, CTF-benchmarks, real-world-attacks, bottleneck-analysis, governance-framework, benchmark-reality-gap] +processed_by: theseus +processed_date: 2026-03-25 +enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -61,3 +65,13 @@ Low-translation bottlenecks (benchmark scores don't predict real impact): PRIMARY CONNECTION: [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur]] — compare/contrast: bio risk grounded in text benchmarks (gap large); cyber risk grounded in real-world incidents (gap smaller, different direction) WHY ARCHIVED: Provides the most systematic treatment of the cyber benchmark-reality gap; documents that real-world cyber capability evidence already exists at scale, making the B1 urgency argument strongest for this domain EXTRACTION HINT: Two potential claims: (1) cyber benchmark gap is direction-asymmetric (overstates exploitation, understates reconnaissance); (2) cyber is the exceptional domain with documented real-world dangerous capability. Check first whether existing KB cyber claims already cover state-sponsored campaigns or zero-days before extracting — the existing claim [[current language models escalate to nuclear war in simulated conflicts]] is in the institutional context section; this cyber capability claim is different. + + +## Key Facts +- Gemini 2.0 Flash achieved 40% success rate on operational security tasks in cyber evaluations +- AI models achieved only 6.25% success rate on real-world vulnerability exploitation despite higher CTF benchmark scores +- AISLE system found all 12 zero-day vulnerabilities in January 2026 OpenSSL security release +- Google Threat Intelligence Group catalogued 12,000+ AI cyber incidents +- Hack The Box AI Range evaluation conducted December 2025 +- Model solved 11/50 CTF challenges (22% overall success rate) +- Research identified 7 representative attack chain archetypes from real-world incident data