extract: 2026-01-28-nasa-cld-phase2-frozen-saa-revised-approach #1666

Merged
leo merged 2 commits from extract/2026-01-28-nasa-cld-phase2-frozen-saa-revised-approach into main 2026-03-23 12:55:34 +00:00
13 changed files with 86 additions and 5 deletions
Showing only changes of commit cb6bd52994 - Show all commits

View file

@ -17,6 +17,12 @@ This leaves motivation selection as the only durable approach: either direct spe
---
### Additional Evidence (confirm)
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
Current models already demonstrate >50% success on hardest variants of tasks designed to test circumvention of security controls (KYC, persistent deployment evasion). The capability trajectory shows rapid improvement in exactly the domains where containment depends on security measures designed by humans.
Relevant Notes:
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- Bostrom's analysis shows why motivation selection must precede capability scaling
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving is a form of motivation selection that avoids the limitations of both direct specification and one-shot loading

View file

@ -63,6 +63,12 @@ The research-to-compliance translation gap fails for the same structural reason
The coordination gap provides the mechanism explaining why voluntary commitments fail even beyond racing dynamics: coordination infrastructure investments have diffuse benefits but concentrated costs, creating a public goods problem. Labs won't build shared response infrastructure unilaterally because competitors free-ride on the benefits while the builder bears full costs. This is distinct from the competitive pressure argument — it's about why shared infrastructure doesn't get built even when racing isn't the primary concern.
### Additional Evidence (confirm)
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
RepliBench exists as a comprehensive self-replication evaluation tool but is not integrated into compliance frameworks despite EU AI Act Article 55 taking effect after its publication. Labs can voluntarily use it but face no enforcement mechanism requiring them to do so, creating competitive pressure to avoid evaluations that might reveal concerning capabilities.
Relevant Notes:

View file

@ -48,6 +48,12 @@ The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failu
NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026).
### Additional Evidence (extend)
*Source: [[2026-03-22-automation-bias-rct-ai-trained-physicians]] | Added: 2026-03-23*
RCT evidence (NCT06963957, medRxiv August 2025) shows automation bias persists even after 20 hours of AI-literacy training specifically designed to teach critical evaluation of AI output. Physicians with this training still voluntarily deferred to deliberately erroneous LLM recommendations in 3 of 6 clinical vignettes, demonstrating that the human-in-the-loop degradation mechanism operates even when humans are extensively trained to resist it.

View file

@ -92,6 +92,8 @@ The futarchy governance protocol on Solana. Implements decision markets through
- **2026-02-07** — First failed ICO: Hurupay raised $2M against $3M minimum, all capital refunded under unruggable ICO mechanics
- **2026-03-26** — [[metadao-p2p-me-ico]] Active: P2P.me ICO launched targeting $6M at $15.5M FDV, backed by Multicoin Capital and Coinbase Ventures (closes March 30)
- **2025-Q4** — Reached first operating profitability with $2.51M in fee revenue from Futarchy AMM and Meteora pools; expanded futarchy ecosystem from 2 to 8 protocols; total futarchy market cap reached $219M with non-META market cap of $69M; hosted 6 ICOs in quarter raising $18.7M; maintains 15+ quarters of runway
- **2026-03-21** — [[metadao-meta036-hanson-futarchy-research]] Active: Proposal to fund $80K academic research at GMU led by Robin Hanson, trading at 50% likelihood
- **2025-Q4** — Achieved first operating profitability with $2.51M in fee revenue from Futarchy AMM and Meteora pools; hosted 6 ICOs in quarter raising $18.7M; expanded futarchy ecosystem from 2 to 8 protocols; total equity grew from $4M to $16.5M
## Key Decisions
| Date | Proposal | Proposer | Category | Outcome |
|------|----------|----------|----------|---------|

View file

@ -57,4 +57,5 @@ Treasury controlled by token holders through futarchy-based governance. Team can
- **2026-03-26** — [[p2p-me-metadao-ico]] Active: ICO scheduled, targeting $6M raise at $15.5M FDV with Pine Analytics identifying 182x gross profit multiple concerns
- **2026-03-26** — [[p2p-me-ico-march-2026]] Active: $6M ICO at $15.5M FDV scheduled on MetaDAO
- **2026-03-26** — [[metadao-p2p-me-ico]] Active: ICO launch targeting $15.5M FDV at 182x gross profit multiple
- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Active: ICO scheduled, targeting $6M at $15.5M FDV
- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Active: ICO scheduled, targeting $6M at $15.5M FDV
- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Status pending: ICO vote scheduled

View file

@ -7,7 +7,7 @@ date: 2026-03-00
domain: ai-alignment
secondary_domains: []
format: paper
status: enrichment
status: processed
priority: high
tags: [coordination-gap, institutional-readiness, frontier-AI-safety, precommitment, incident-response, coordination-failure, nuclear-analogies, pandemic-preparedness, B2-confirms]
processed_by: theseus

View file

@ -7,7 +7,7 @@ date: 2025-04-21
domain: ai-alignment
secondary_domains: []
format: paper
status: unprocessed
status: processed
priority: high
tags: [self-replication, autonomous-replication, capability-evaluation, AISI, RepliBench, loss-of-control, EU-AI-Act, benchmark]
---

View file

@ -7,7 +7,7 @@ date: 2026-01-01
domain: health
secondary_domains: [ai-alignment]
format: regulatory document
status: null-result
status: processed
priority: high
tags: [eu-ai-act, regulatory, clinical-ai-safety, high-risk-ai, healthcare-compliance, transparency, human-oversight, belief-3, belief-5]
processed_by: vida

View file

@ -7,7 +7,7 @@ date: 2025-08-26
domain: health
secondary_domains: [ai-alignment]
format: research paper
status: unprocessed
status: processed
priority: high
tags: [automation-bias, clinical-ai-safety, physician-rct, llm-diagnostic, centaur-model, ai-literacy, chatgpt, randomized-trial]
---

View file

@ -0,0 +1,34 @@
{
"rejected_claims": [
{
"filename": "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 2,
"kept": 0,
"fixed": 4,
"rejected": 2,
"fixes_applied": [
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:set_created:2026-03-23",
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:scalable oversight degrades rapidly as capability gaps grow",
"self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:set_created:2026-03-23"
],
"rejections": [
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:missing_attribution_extractor",
"self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-23"
}

View file

@ -0,0 +1,26 @@
{
"rejected_claims": [
{
"filename": "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 1,
"kept": 0,
"fixed": 3,
"rejected": 1,
"fixes_applied": [
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:set_created:2026-03-23",
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon",
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin"
],
"rejections": [
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-23"
}