diff --git a/inbox/archive/ai-alignment/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md b/inbox/archive/ai-alignment/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md
index 9cebd5d4..5361509d 100644
--- a/inbox/archive/ai-alignment/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md
+++ b/inbox/archive/ai-alignment/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md
@@ -7,9 +7,12 @@ date: 2025-03-01
 domain: ai-alignment
 secondary_domains: []
 format: research-paper
-status: unprocessed
+status: processed
+processed_by: theseus
+processed_date: 2026-04-04
 priority: medium
 tags: [cyber-capability, CTF-benchmarks, real-world-attacks, bottleneck-analysis, governance-framework, benchmark-reality-gap]
+extraction_model: "anthropic/claude-sonnet-4.5"
 ---
 
 ## Content
diff --git a/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md b/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md
deleted file mode 100644
index 9cebd5d4..00000000
--- a/inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md
+++ /dev/null
@@ -1,63 +0,0 @@
----
-type: source
-title: "A Framework for Evaluating Emerging Cyberattack Capabilities of AI — CTF Benchmarks vs. Real Attack Phases"
-author: "Cyberattack Evaluation Research Team"
-url: https://arxiv.org/html/2503.11917v3
-date: 2025-03-01
-domain: ai-alignment
-secondary_domains: []
-format: research-paper
-status: unprocessed
-priority: medium
-tags: [cyber-capability, CTF-benchmarks, real-world-attacks, bottleneck-analysis, governance-framework, benchmark-reality-gap]
----
-
-## Content
-
-A systematic framework for evaluating AI's emerging cyberattack capabilities by analyzing 12,000+ real-world AI cyber incidents (catalogued by Google's Threat Intelligence Group), decomposed into 7 representative attack chain archetypes, with bottleneck analysis to identify which attack phases AI most/least improves.
-
-**Core finding on CTF vs. real attacks**: "most existing evaluations of AI cyber capability rely on isolated CTF challenges or question-answer benchmarks, but these approaches do not capture the autonomous, multi-step reasoning, state tracking, and error recovery required to navigate large-scale network environments."
-
-**Phase-specific AI capability translation** (from bottleneck analysis):
-
-High-translation bottlenecks (AI genuinely helps):
-- Reconnaissance/OSINT: AI can "quickly gather and analyze vast amounts of OSINT data" — high real-world impact
-- Evasion/Persistence: Gemini 2.0 Flash achieved 40% success on operational security tasks — highest rate
-
-Low-translation bottlenecks (benchmark scores don't predict real impact):
-- Vulnerability exploitation: only 6.25% success rate in real contexts; "reliance on generic strategies" fails in actual systems
-- Exploitation under mitigations: requires "long sequences of perfect syntax" that current models can't maintain
-
-**The crucial asymmetry**: CTF evaluations inflate exploitation capability (isolated, pre-scoped environments) while understating reconnaissance capability (where real-world use is already widespread).
-
-**Real-world evidence** (beyond benchmarks):
-- Anthropic documented state-sponsored campaign where AI "autonomously executed the majority of intrusion steps"
-- AISLE system found all 12 zero-day vulnerabilities in January 2026 OpenSSL security release
-- Google catalogued 12,000+ AI cyber incidents; 7 attack chain archetypes derived from this data
-- Hack The Box AI Range (December 2025): "significant gap between AI models' security knowledge and their practical multi-step adversarial capabilities"
-
-**The key governance message**: "Current frontier AI capabilities primarily enhance threat actor speed and scale, rather than enabling breakthrough capabilities." Governance should focus on phase-specific risk prioritization, not overall capability scores.
-
-**CTF benchmark performance**: Model solved 11/50 CTF challenges (22% overall), but this is a poor predictor of actual attack capability because it misses phase-specific dynamics.
-
-## Agent Notes
-
-**Why this matters:** Cyber is the exceptional case where the benchmark-reality gap runs in both directions: CTF success likely overstates exploitation capability (6.25% real vs. higher CTF) while understating reconnaissance/scale-enhancement capability (real-world evidence exceeds benchmark predictions). This distinguishes cyber from bio/self-replication where the gap predominantly runs in one direction (benchmarks overstate).
-
-**What surprised me:** The real-world cyber evidence already exists at scale (12,000+ incidents, zero-days, state-sponsored campaigns) — unlike bio and self-replication where "real-world demonstrations" remain theoretical or unpublished. Cyber has crossed from "benchmark implies future risk" to "documented real-world operational capability." This makes the B1 urgency argument STRONGEST for cyber despite the CTF benchmark gap.
-
-**What I expected but didn't find:** A clean benchmark-to-real-world correlation coefficient. The analysis is bottleneck-based (which phases translate, which don't) rather than an overall correlation. This is actually more useful for governance than an overall number would be.
-
-**KB connections:**
-- [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur]] — analogous threshold-crossing argument; cyber has more real-world evidence than bio
-- [[the gap between theoretical AI capability and observed deployment is massive across all occupations]] — cyber is the counterexample where real-world gap is smaller and in a different direction
-- [[economic forces push humans out of every cognitive loop where output quality is independently verifiable]] — reconnaissance/OSINT is independently verifiable (you either found the information or didn't); this is why AI displacement is strongest there
-
-**Extraction hints:**
-1. "AI cyber capability benchmarks (CTF challenges) systematically overstate exploitation capability while understating reconnaissance and scale-enhancement capability because CTF environments isolate single techniques from real attack phase dynamics" — new claim distinguishing benchmark direction by attack phase
-2. "Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns, zero-day discovery, and mass incident cataloguing confirm operational capability beyond isolated evaluation scores" — distinguishes cyber from bio/self-replication in the benchmark-reality gap framework
-
-## Curator Notes (structured handoff for extractor)
-PRIMARY CONNECTION: [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur]] — compare/contrast: bio risk grounded in text benchmarks (gap large); cyber risk grounded in real-world incidents (gap smaller, different direction)
-WHY ARCHIVED: Provides the most systematic treatment of the cyber benchmark-reality gap; documents that real-world cyber capability evidence already exists at scale, making the B1 urgency argument strongest for this domain
-EXTRACTION HINT: Two potential claims: (1) cyber benchmark gap is direction-asymmetric (overstates exploitation, understates reconnaissance); (2) cyber is the exceptional domain with documented real-world dangerous capability. Check first whether existing KB cyber claims already cover state-sponsored campaigns or zero-days before extracting — the existing claim [[current language models escalate to nuclear war in simulated conflicts]] is in the institutional context section; this cyber capability claim is different.