extract: 2026-03-26-anthropic-activating-asl3-protections

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
entity-batch: update 2 entities
2026-03-26 03:02:27 +00:00 · 2026-03-26 03:00:33 +00:00 · 2026-03-26 02:46:30 +00:00 · 2026-03-26 02:16:24 +00:00 · 2026-03-26 02:04:22 +00:00 · 2026-03-26 01:46:19 +00:00
9 changed files with 69 additions and 12 deletions
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -40,6 +40,12 @@ STREAM framework proposes standardized ChemBio evaluation reporting with 23-expe
 AISLE's autonomous discovery of 12 OpenSSL CVEs including a 30-year-old bug demonstrates that AI also lowers the expertise barrier for offensive cyber from specialized security researcher to automated system. Unlike bioweapons, zero-day discovery is also a defensive capability, but the dual-use nature means the same autonomous system that defends can be redirected offensively. The fact that this capability is already deployed commercially while governance frameworks haven't incorporated it suggests the expertise-barrier-lowering dynamic extends beyond bio to cyber domains.
 ### Additional Evidence (confirm)
 *Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
 Anthropic's decision to activate ASL-3 protections was driven by evidence that Claude Sonnet 3.7 showed 'measurably better' performance on CBRN weapon acquisition tasks compared to standard internet resources, and that Virology Capabilities Test performance had been 'steadily increasing over time' across Claude model generations. This provides empirical confirmation that the expertise barrier is lowering in practice, not just theory, and that the trend is consistent enough to justify precautionary governance action.
 Relevant Notes:
 - [[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]] — Amodei's admission of Claude exhibiting deception and subversion during testing is a concrete instance of this pattern, with bioweapon implications
--- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md
+++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md
@ -82,6 +82,16 @@ Prandi et al. provide the specific mechanism for why pre-deployment evaluations
 Anthropic's stated rationale for extending evaluation intervals from 3 to 6 months explicitly acknowledges that 'the science of model evaluation isn't well-developed enough' and that rushed evaluations produce lower-quality results. This is a direct admission from a frontier lab that current evaluation methodologies are insufficiently mature to support the governance structures built on them. The 'zone of ambiguity' where capabilities approached but didn't definitively pass thresholds in v2.0 demonstrates that evaluation uncertainty creates governance paralysis.
 ### Auto-enrichment (near-duplicate conversion, similarity=1.00)
 *Source: PR #1936 — "pre deployment ai evaluations do not predict real world risk creating institutional governance built on unreliable foundations"*
 *Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.*
 ### Additional Evidence (extend)
 *Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
 Anthropic's ASL-3 activation demonstrates that evaluation uncertainty compounds near capability thresholds: 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' The Virology Capabilities Test showed 'steadily increasing' performance across model generations, but Anthropic could not definitively confirm whether Opus 4 crossed the threshold—they activated protections based on trend trajectory and inability to rule out crossing rather than confirmed measurement.
 ---
 ### Additional Evidence (confirm)
@ -144,6 +154,12 @@ METR's August 2025 research update provides specific quantification of the evalu
 Anthropic explicitly acknowledged that 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' This is a frontier lab publicly stating that evaluation reliability degrades precisely when it matters most—near capability thresholds. The ASL-3 activation was triggered by this evaluation uncertainty rather than confirmed capability, suggesting governance frameworks are adapting to evaluation unreliability rather than solving it.
 ### Additional Evidence (extend)
 *Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
 Anthropic's ASL-3 activation explicitly acknowledges that 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' This is the first public admission from a frontier lab that evaluation reliability degrades near capability thresholds, creating a zone where governance must operate under irreducible uncertainty. The activation proceeded despite being unable to 'clearly rule out ASL-3 risks' in the way previous models could be confirmed safe, demonstrating that the evaluation limitation is not theoretical but operationally binding.
--- a/domains/ai-alignment/voluntary
+++ b/domains/ai-alignment/voluntary
@ -68,6 +68,12 @@ The coordination gap provides the mechanism explaining why voluntary commitments
 RepliBench exists as a comprehensive self-replication evaluation tool but is not integrated into compliance frameworks despite EU AI Act Article 55 taking effect after its publication. Labs can voluntarily use it but face no enforcement mechanism requiring them to do so, creating competitive pressure to avoid evaluations that might reveal concerning capabilities.
 ### Additional Evidence (challenge)
 *Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
 Anthropic maintained its ASL-3 commitment through precautionary activation despite commercial pressure to deploy Claude Opus 4 without additional constraints. This is a counter-example to the claim that voluntary commitments inevitably collapse under competition. However, the commitment was maintained through a narrow scoping of protections (only 'extended, end-to-end CBRN workflows') and the activation occurred in May 2025, before the RSP v3.0 rollback documented in February 2026. The temporal sequence suggests the commitment held temporarily but may have contributed to competitive pressure that later forced the RSP weakening.
--- a/domains/internet-finance/metadao-autocrat-v01-reduces-proposal-duration-to-three-days-enabling-faster-governance-iteration.md
+++ b/domains/internet-finance/metadao-autocrat-v01-reduces-proposal-duration-to-three-days-enabling-faster-governance-iteration.md
@ -26,6 +26,14 @@ Coal's v0.6 parameters set proposal length at 3 days with 1-day TWAP delay, conf
 {"action": "flag_duplicate", "candidates": ["decisions/internet-finance/metadao-governance-migration-2026-03.md", "domains/internet-finance/metadao-autocrat-migration-accepted-counterparty-risk-from-unverifiable-builds-prioritizing-iteration-speed-over-security-guarantees.md", "domains/internet-finance/futarchy-governed-daos-converge-on-traditional-corporate-governance-scaffolding-for-treasury-operations-because-market-mechanisms-alone-cannot-provide-operational-security-and-legal-compliance.md"], "reasoning": "The reviewer explicitly states that the new decision record duplicates `decisions/internet-finance/metadao-governance-migration-2026-03.md`. The reviewer also suggests that the claim addition is a stretch for the v0.1 claim and would be more defensible for `metadao-autocrat-migration-accepted-counterparty-risk-from-unverifiable-builds-prioritizing-iteration-speed-over-security-guarantees.md`. Finally, the reviewer notes that the Squads multisig integration connects directly to `futarchy-governed-daos-converge-on-traditional-corporate-governance-scaffolding-for-treasury-operations-because-market-mechanisms-alone-cannot-provide-operational-security-and-legal-compliance.md`."}
 ```
 ### Auto-enrichment (near-duplicate conversion, similarity=1.00)
 *Source: PR #1939 — "metadao autocrat v01 reduces proposal duration to three days enabling faster governance iteration"*
 *Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.*
 {"action": "flag_duplicate", "candidates": ["decisions/internet-finance/metadao-governance-migration-2026-03.md", "domains/internet-finance/metadao-autocrat-migration-accepted-counterparty-risk-from-unverifiable-builds-prioritizing-iteration-speed-over-security-guarantees.md", "domains/internet-finance/futarchy-governed-daos-converge-on-traditional-corporate-governance-scaffolding-for-treasury-operations-because-market-mechanisms-alone-cannot-provide-operational-security-and-legal-compliance.md"], "reasoning": "The new decision file `metadao-omnibus-migration-proposal-march-2026.md` is a substantive duplicate of `decisions/internet-finance/metadao-governance-migration-2026-03.md`. The reviewer explicitly states that the new file should be merged into the existing one. The enrichment added to `metadao-autocrat-v01-reduces-proposal-duration-to-three-days-enabling-faster-governance-iteration.md` is misplaced. The reviewer suggests it would be more appropriate for `metadao-autocrat-migration-accepted-counterparty-risk-from-unverifiable-builds-prioritizing-iteration-speed-over-security-guarantees.md` due to the iterative migration pattern and community consensus superseding uncertainty. Additionally, the Squads v4.0 integration identified in the source directly extends `futarchy-governed-daos-converge-on-traditional-corporate-governance-scaffolding-for-treasury-operations-because-market-mechanisms-alone-cannot-provide-operational-security-and-legal-compliance.md` by providing a structural fix for the execution velocity problem."}
 ```
 ---
 Relevant Notes:
--- a/entities/ai-alignment/anthropic.md
+++ b/entities/ai-alignment/anthropic.md
@ -65,6 +65,7 @@ Frontier AI safety laboratory founded by former OpenAI VP of Research Dario Amod
 - **2025-08-01** — Documented first large-scale AI-orchestrated cyberattack using Claude Code for 80-90% autonomous offensive operations against 17+ organizations; developed reactive detection methods and published threat intelligence report
 - **2026-02-24** — RSP v3.0 released: added Frontier Safety Roadmap and Periodic Risk Reports, but removed pause commitment entirely, demoted RAND Security Level 4 to recommendations, and removed cyber operations from binding commitments (GovAI analysis)
 - **2025-05-01** — Activated ASL-3 protections for Claude Opus 4 as precautionary measure without confirmed threshold crossing, citing evaluation uncertainty and upward capability trends
 - **2025-05-01** — Activated ASL-3 protections for Claude Opus 4 as precautionary measure without confirmed threshold crossing, first model that could not be positively ruled below ASL-3 thresholds
 ## Competitive Position
 Strongest position in enterprise AI and coding. Revenue growth (10x YoY) outpaces all competitors. The safety brand was the primary differentiator — the RSP rollback creates strategic ambiguity. CEO publicly uncomfortable with power concentration while racing to concentrate it.
--- a/entities/internet-finance/metadao.md
+++ b/entities/internet-finance/metadao.md
@ -185,6 +185,12 @@ The futarchy governance protocol on Solana. Implements decision markets through
 - **2024-03-31** — [[metadao-appoint-nallok-proph3t-benevolent-dictators]] Passed: Appointed Proph3t and Nallok as BDF3M with 1015 META + 100k USDC compensation for 7 months to address execution bottlenecks
 - **2026-03-23** — [[metadao-omnibus-migration-proposal]] Active at 84% pass probability with $408K traded: Proposal to migrate DAO program to new version and update legal documents, includes Squads v4.0 multisig integration
 - **2026-03-23** — [[metadao-omnibus-migration-proposal]] Active at 84% pass probability with $408K traded: Proposal to migrate DAO program with Squads integration and update legal documents
 - **2026-03-23** — Omnibus proposal to migrate DAO program and update legal documents reached 84% pass probability with $408K governance market volume
 - **2026-03-23** — [[metadao-omnibus-migration-2026]] Active: DAO program migration with Squads multisig integration reached 84% pass probability, $408K volume
 - **2026-03-23** — [[metadao-omnibus-migration-proposal-march-2026]] Active at 84% pass probability: Omnibus proposal to migrate autocrat program, integrate Squads v4.0 multisig, and update legal documents ($408K volume)
 - **2026-03-23** — [[metadao-omnibus-migration-proposal]] Proposal active at 84% pass probability with $408K traded, proposing autocrat program migration and Squads v4.0 multisig integration
 - **2026-03-23** — [[metadao-omnibus-migration-proposal-march-2026]] Active at 84% pass probability: Omnibus proposal to migrate autocrat program, update legal documents, and integrate Squads v4.0 multisig ($408K volume)
 - **2026-03-23** — [[metadao-migration-proposal-2026]] Active (84% likelihood): Migration to new onchain DAO program with $408K traded
 ## Key Decisions
 | Date | Proposal | Proposer | Category | Outcome |
 |------|----------|----------|----------|---------|
--- a/entities/internet-finance/ranger-finance.md
+++ b/entities/internet-finance/ranger-finance.md
@ -61,6 +61,7 @@ Perps aggregator and DEX aggregation platform on Solana/Hyperliquid. Three produ
 - **2026-03-23** — [[ranger-finance-liquidation-2026]] Passed: Liquidation executed with 97% support, returning 5M USDC to holders at $0.78 book value
 - **2026-03** — [[ranger-finance-liquidation-2026]] Passed with 97% support: Liquidation returned 5M USDC to holders at $0.78 book value, IP returned to team
 - **2026-03** — [[ranger-finance-liquidation-2026]] Passed with 97% support: Liquidation returned ~5M USDC to token holders at $0.78 book value after governance determined team underdelivery
 - **2026-03** — [[ranger-finance-liquidation-2026]] Passed (97%): Liquidation returning 5M USDC to holders at $0.78 book value
 ## Significance for KB
 Ranger is THE test case for futarchy-governed enforcement. The system is working as designed: investors funded a project, the project underperformed relative to representations, the community used futarchy to force liquidation and treasury return. This is exactly what the "unruggable ICO" mechanism promises — and Ranger is the first live demonstration.
--- a/inbox/queue/.extraction-debug/2026-03-26-anthropic-activating-asl3-protections.json
+++ b/inbox/queue/.extraction-debug/2026-03-26-anthropic-activating-asl3-protections.json
@ -1,13 +1,13 @@
 {
  "rejected_claims": [
    {
-      "filename": "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable.md",
+      "filename": "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable-near-thresholds.md",
      "issues": [
        "missing_attribution_extractor"
      ]
    },
    {
-      "filename": "ai-safety-commitments-lack-independent-verification-creating-self-referential-accountability-that-cannot-detect-motivated-reasoning.md",
+      "filename": "ai-safety-governance-lacks-independent-verification-creating-self-referential-accountability-where-labs-assess-their-own-compliance.md",
      "issues": [
        "missing_attribution_extractor"
      ]
@ -16,20 +16,21 @@
  "validation_stats": {
    "total": 2,
    "kept": 0,
-    "fixed": 7,
+    "fixed": 8,
    "rejected": 2,
    "fixes_applied": [
-      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable.md:set_created:2026-03-26",
+      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable-near-thresholds.md:set_created:2026-03-26",
-      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable.md:stripped_wiki_link:voluntary-safety-pledges-cannot-survive-competitive-pressure",
+      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable-near-thresholds.md:stripped_wiki_link:pre-deployment-AI-evaluations-do-not-predict-real-world-risk",
-      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable.md:stripped_wiki_link:safe-AI-development-requires-building-alignment-mechanisms-b",
+      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable-near-thresholds.md:stripped_wiki_link:voluntary safety pledges cannot survive competitive pressure",
-      "ai-safety-commitments-lack-independent-verification-creating-self-referential-accountability-that-cannot-detect-motivated-reasoning.md:set_created:2026-03-26",
+      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable-near-thresholds.md:stripped_wiki_link:safe AI development requires building alignment mechanisms b",
-      "ai-safety-commitments-lack-independent-verification-creating-self-referential-accountability-that-cannot-detect-motivated-reasoning.md:stripped_wiki_link:voluntary-safety-pledges-cannot-survive-competitive-pressure",
+      "ai-safety-governance-lacks-independent-verification-creating-self-referential-accountability-where-labs-assess-their-own-compliance.md:set_created:2026-03-26",
-      "ai-safety-commitments-lack-independent-verification-creating-self-referential-accountability-that-cannot-detect-motivated-reasoning.md:stripped_wiki_link:Anthropics-RSP-rollback-under-commercial-pressure-is-the-fir",
+      "ai-safety-governance-lacks-independent-verification-creating-self-referential-accountability-where-labs-assess-their-own-compliance.md:stripped_wiki_link:AI transparency is declining not improving because Stanford ",
-      "ai-safety-commitments-lack-independent-verification-creating-self-referential-accountability-that-cannot-detect-motivated-reasoning.md:stripped_wiki_link:AI-transparency-is-declining-not-improving-because-Stanford-"
+      "ai-safety-governance-lacks-independent-verification-creating-self-referential-accountability-where-labs-assess-their-own-compliance.md:stripped_wiki_link:only binding regulation with enforcement teeth changes front",
      "ai-safety-governance-lacks-independent-verification-creating-self-referential-accountability-where-labs-assess-their-own-compliance.md:stripped_wiki_link:voluntary safety pledges cannot survive competitive pressure"
    ],
    "rejections": [
-      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable.md:missing_attribution_extractor",
+      "precautionary-ai-governance-triggers-protection-escalation-when-capability-evaluation-becomes-unreliable-near-thresholds.md:missing_attribution_extractor",
-      "ai-safety-commitments-lack-independent-verification-creating-self-referential-accountability-that-cannot-detect-motivated-reasoning.md:missing_attribution_extractor"
+      "ai-safety-governance-lacks-independent-verification-creating-self-referential-accountability-where-labs-assess-their-own-compliance.md:missing_attribution_extractor"
    ]
  },
  "model": "anthropic/claude-sonnet-4.5",
--- a/inbox/queue/2026-03-26-anthropic-activating-asl3-protections.md
+++ b/inbox/queue/2026-03-26-anthropic-activating-asl3-protections.md
@ -14,6 +14,10 @@ processed_by: theseus
 processed_date: 2026-03-26
 enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
 processed_by: theseus
 processed_date: 2026-03-26
 enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
 ---
 ## Content
@ -61,3 +65,11 @@ EXTRACTION HINT: Focus on the *logic* of precautionary activation (uncertainty t
 - Claude Sonnet 3.7 showed measurable participant uplift on CBRN weapon acquisition tasks compared to standard internet resources
 - Virology Capabilities Test performance had been steadily increasing over time across Claude model generations
 - Anthropic's RSP explicitly permits deployment under a higher standard than confirmed necessary
 ## Key Facts
 - Claude Opus 4 was deployed with ASL-3 protections in May 2025
 - Claude Sonnet 3.7 showed measurable uplift on CBRN weapon acquisition tasks compared to internet resources
 - Virology Capabilities Test performance increased steadily across Claude model generations
 - ASL-3 protections were scoped to prevent assistance with extended end-to-end CBRN workflows
 - Anthropic's RSP explicitly permits deployment under higher standards than confirmed necessary
Author	SHA1	Message	Date
Teleo Agents	e05951fc1a	extract: 2026-03-26-anthropic-activating-asl3-protections Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>	2026-03-26 03:02:27 +00:00
Teleo Agents	3315d1b4b4	entity-batch: update 2 entities - Applied 2 entity operations from queue - Files: entities/internet-finance/metadao.md, entities/internet-finance/ranger-finance.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 03:00:33 +00:00
Teleo Agents	5b2c0d3708	entity-batch: update 1 entities - Applied 1 entity operations from queue - Files: entities/internet-finance/metadao.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 02:46:30 +00:00
Teleo Agents	1a2fc89850	entity-batch: update 1 entities - Applied 1 entity operations from queue - Files: entities/internet-finance/metadao.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 02:16:24 +00:00
Teleo Agents	19bc0777bb	entity-batch: update 1 entities Some checks are pending Sync Graph Data to teleo-app / sync (push) Waiting to run Details - Applied 1 entity operations from queue - Files: domains/internet-finance/metadao-autocrat-v01-reduces-proposal-duration-to-three-days-enabling-faster-governance-iteration.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 02:04:22 +00:00
Teleo Agents	b0744ddf11	entity-batch: update 1 entities - Applied 1 entity operations from queue - Files: entities/internet-finance/metadao.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 01:46:19 +00:00
Teleo Agents	401f14f922	entity-batch: update 1 entities - Applied 2 entity operations from queue - Files: entities/internet-finance/metadao.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 01:16:14 +00:00
Teleo Agents	b3c06598dd	entity-batch: update 1 entities Some checks are pending Sync Graph Data to teleo-app / sync (push) Waiting to run Details - Applied 1 entity operations from queue - Files: domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 01:05:12 +00:00
Teleo Agents	e86df50104	entity-batch: update 1 entities - Applied 1 entity operations from queue - Files: entities/ai-alignment/anthropic.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>	2026-03-26 01:01:10 +00:00