diff --git a/domains/ai-alignment/capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md b/domains/ai-alignment/capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md index e91c2226..3a9a3c95 100644 --- a/domains/ai-alignment/capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md +++ b/domains/ai-alignment/capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md @@ -17,6 +17,12 @@ This leaves motivation selection as the only durable approach: either direct spe --- +### Additional Evidence (confirm) +*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23* + +Current models already demonstrate >50% success on hardest variants of tasks designed to test circumvention of security controls (KYC, persistent deployment evasion). The capability trajectory shows rapid improvement in exactly the domains where containment depends on security measures designed by humans. + + Relevant Notes: - [[safe AI development requires building alignment mechanisms before scaling capability]] -- Bostrom's analysis shows why motivation selection must precede capability scaling - [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving is a form of motivation selection that avoids the limitations of both direct specification and one-shot loading diff --git a/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md b/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md index fdc955f5..4e62f756 100644 --- a/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md +++ b/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md @@ -63,6 +63,12 @@ The research-to-compliance translation gap fails for the same structural reason The coordination gap provides the mechanism explaining why voluntary commitments fail even beyond racing dynamics: coordination infrastructure investments have diffuse benefits but concentrated costs, creating a public goods problem. Labs won't build shared response infrastructure unilaterally because competitors free-ride on the benefits while the builder bears full costs. This is distinct from the competitive pressure argument — it's about why shared infrastructure doesn't get built even when racing isn't the primary concern. +### Additional Evidence (confirm) +*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23* + +RepliBench exists as a comprehensive self-replication evaluation tool but is not integrated into compliance frameworks despite EU AI Act Article 55 taking effect after its publication. Labs can voluntarily use it but face no enforcement mechanism requiring them to do so, creating competitive pressure to avoid evaluations that might reveal concerning capabilities. + + Relevant Notes: diff --git a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md index 57e41718..2cadc630 100644 --- a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md +++ b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md @@ -48,6 +48,12 @@ The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failu NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026). +### Additional Evidence (extend) +*Source: [[2026-03-22-automation-bias-rct-ai-trained-physicians]] | Added: 2026-03-23* + +RCT evidence (NCT06963957, medRxiv August 2025) shows automation bias persists even after 20 hours of AI-literacy training specifically designed to teach critical evaluation of AI output. Physicians with this training still voluntarily deferred to deliberately erroneous LLM recommendations in 3 of 6 clinical vignettes, demonstrating that the human-in-the-loop degradation mechanism operates even when humans are extensively trained to resist it. + + diff --git a/entities/internet-finance/metadao.md b/entities/internet-finance/metadao.md index 28381b20..a28e744d 100644 --- a/entities/internet-finance/metadao.md +++ b/entities/internet-finance/metadao.md @@ -92,6 +92,8 @@ The futarchy governance protocol on Solana. Implements decision markets through - **2026-02-07** — First failed ICO: Hurupay raised $2M against $3M minimum, all capital refunded under unruggable ICO mechanics - **2026-03-26** — [[metadao-p2p-me-ico]] Active: P2P.me ICO launched targeting $6M at $15.5M FDV, backed by Multicoin Capital and Coinbase Ventures (closes March 30) - **2025-Q4** — Reached first operating profitability with $2.51M in fee revenue from Futarchy AMM and Meteora pools; expanded futarchy ecosystem from 2 to 8 protocols; total futarchy market cap reached $219M with non-META market cap of $69M; hosted 6 ICOs in quarter raising $18.7M; maintains 15+ quarters of runway +- **2026-03-21** — [[metadao-meta036-hanson-futarchy-research]] Active: Proposal to fund $80K academic research at GMU led by Robin Hanson, trading at 50% likelihood +- **2025-Q4** — Achieved first operating profitability with $2.51M in fee revenue from Futarchy AMM and Meteora pools; hosted 6 ICOs in quarter raising $18.7M; expanded futarchy ecosystem from 2 to 8 protocols; total equity grew from $4M to $16.5M ## Key Decisions | Date | Proposal | Proposer | Category | Outcome | |------|----------|----------|----------|---------| diff --git a/entities/internet-finance/p2p-me.md b/entities/internet-finance/p2p-me.md index 38ffa012..0399766b 100644 --- a/entities/internet-finance/p2p-me.md +++ b/entities/internet-finance/p2p-me.md @@ -57,4 +57,5 @@ Treasury controlled by token holders through futarchy-based governance. Team can - **2026-03-26** — [[p2p-me-metadao-ico]] Active: ICO scheduled, targeting $6M raise at $15.5M FDV with Pine Analytics identifying 182x gross profit multiple concerns - **2026-03-26** — [[p2p-me-ico-march-2026]] Active: $6M ICO at $15.5M FDV scheduled on MetaDAO - **2026-03-26** — [[metadao-p2p-me-ico]] Active: ICO launch targeting $15.5M FDV at 182x gross profit multiple -- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Active: ICO scheduled, targeting $6M at $15.5M FDV \ No newline at end of file +- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Active: ICO scheduled, targeting $6M at $15.5M FDV +- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Status pending: ICO vote scheduled \ No newline at end of file diff --git a/inbox/queue/2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md b/inbox/archive/ai-alignment/2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md similarity index 99% rename from inbox/queue/2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md rename to inbox/archive/ai-alignment/2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md index d4f352ae..1b882726 100644 --- a/inbox/queue/2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md +++ b/inbox/archive/ai-alignment/2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md @@ -7,7 +7,7 @@ date: 2026-03-00 domain: ai-alignment secondary_domains: [] format: paper -status: enrichment +status: processed priority: high tags: [coordination-gap, institutional-readiness, frontier-AI-safety, precommitment, incident-response, coordination-failure, nuclear-analogies, pandemic-preparedness, B2-confirms] processed_by: theseus diff --git a/inbox/queue/2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md b/inbox/archive/ai-alignment/2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md similarity index 100% rename from inbox/queue/2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md rename to inbox/archive/ai-alignment/2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md diff --git a/inbox/queue/2026-03-21-replibench-autonomous-replication-capabilities.md b/inbox/archive/ai-alignment/2026-03-21-replibench-autonomous-replication-capabilities.md similarity index 99% rename from inbox/queue/2026-03-21-replibench-autonomous-replication-capabilities.md rename to inbox/archive/ai-alignment/2026-03-21-replibench-autonomous-replication-capabilities.md index ab49793d..cdabe459 100644 --- a/inbox/queue/2026-03-21-replibench-autonomous-replication-capabilities.md +++ b/inbox/archive/ai-alignment/2026-03-21-replibench-autonomous-replication-capabilities.md @@ -7,7 +7,7 @@ date: 2025-04-21 domain: ai-alignment secondary_domains: [] format: paper -status: unprocessed +status: processed priority: high tags: [self-replication, autonomous-replication, capability-evaluation, AISI, RepliBench, loss-of-control, EU-AI-Act, benchmark] --- diff --git a/inbox/queue/2026-08-02-eu-ai-act-healthcare-high-risk-obligations.md b/inbox/archive/general/2026-08-02-eu-ai-act-healthcare-high-risk-obligations.md similarity index 99% rename from inbox/queue/2026-08-02-eu-ai-act-healthcare-high-risk-obligations.md rename to inbox/archive/general/2026-08-02-eu-ai-act-healthcare-high-risk-obligations.md index 5671971e..f0dce8fc 100644 --- a/inbox/queue/2026-08-02-eu-ai-act-healthcare-high-risk-obligations.md +++ b/inbox/archive/general/2026-08-02-eu-ai-act-healthcare-high-risk-obligations.md @@ -7,7 +7,7 @@ date: 2026-01-01 domain: health secondary_domains: [ai-alignment] format: regulatory document -status: null-result +status: processed priority: high tags: [eu-ai-act, regulatory, clinical-ai-safety, high-risk-ai, healthcare-compliance, transparency, human-oversight, belief-3, belief-5] processed_by: vida diff --git a/inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md b/inbox/archive/health/2026-03-22-automation-bias-rct-ai-trained-physicians.md similarity index 99% rename from inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md rename to inbox/archive/health/2026-03-22-automation-bias-rct-ai-trained-physicians.md index 3f96fa84..f9e1ed8c 100644 --- a/inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md +++ b/inbox/archive/health/2026-03-22-automation-bias-rct-ai-trained-physicians.md @@ -7,7 +7,7 @@ date: 2025-08-26 domain: health secondary_domains: [ai-alignment] format: research paper -status: unprocessed +status: processed priority: high tags: [automation-bias, clinical-ai-safety, physician-rct, llm-diagnostic, centaur-model, ai-literacy, chatgpt, randomized-trial] --- diff --git a/inbox/queue/2026-01-13-nasaa-clarity-act-concerns.md b/inbox/archive/internet-finance/2026-01-13-nasaa-clarity-act-concerns.md similarity index 100% rename from inbox/queue/2026-01-13-nasaa-clarity-act-concerns.md rename to inbox/archive/internet-finance/2026-01-13-nasaa-clarity-act-concerns.md diff --git a/inbox/queue/.extraction-debug/2026-03-21-replibench-autonomous-replication-capabilities.json b/inbox/queue/.extraction-debug/2026-03-21-replibench-autonomous-replication-capabilities.json new file mode 100644 index 00000000..471d918c --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-21-replibench-autonomous-replication-capabilities.json @@ -0,0 +1,34 @@ +{ + "rejected_claims": [ + { + "filename": "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 2, + "kept": 0, + "fixed": 4, + "rejected": 2, + "fixes_applied": [ + "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:set_created:2026-03-23", + "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and", + "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:scalable oversight degrades rapidly as capability gaps grow", + "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:set_created:2026-03-23" + ], + "rejections": [ + "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:missing_attribution_extractor", + "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-23" +} \ No newline at end of file diff --git a/inbox/queue/.extraction-debug/2026-03-22-automation-bias-rct-ai-trained-physicians.json b/inbox/queue/.extraction-debug/2026-03-22-automation-bias-rct-ai-trained-physicians.json new file mode 100644 index 00000000..5d658605 --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-22-automation-bias-rct-ai-trained-physicians.json @@ -0,0 +1,26 @@ +{ + "rejected_claims": [ + { + "filename": "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 3, + "rejected": 1, + "fixes_applied": [ + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:set_created:2026-03-23", + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon", + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin" + ], + "rejections": [ + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-23" +} \ No newline at end of file