extract: 2026-01-28-nasa-cld-phase2-frozen-saa-revised-approach #1666
13 changed files with 86 additions and 5 deletions
|
|
@ -17,6 +17,12 @@ This leaves motivation selection as the only durable approach: either direct spe
|
|||
|
||||
---
|
||||
|
||||
### Additional Evidence (confirm)
|
||||
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
|
||||
|
||||
Current models already demonstrate >50% success on hardest variants of tasks designed to test circumvention of security controls (KYC, persistent deployment evasion). The capability trajectory shows rapid improvement in exactly the domains where containment depends on security measures designed by humans.
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- Bostrom's analysis shows why motivation selection must precede capability scaling
|
||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving is a form of motivation selection that avoids the limitations of both direct specification and one-shot loading
|
||||
|
|
|
|||
|
|
@ -63,6 +63,12 @@ The research-to-compliance translation gap fails for the same structural reason
|
|||
|
||||
The coordination gap provides the mechanism explaining why voluntary commitments fail even beyond racing dynamics: coordination infrastructure investments have diffuse benefits but concentrated costs, creating a public goods problem. Labs won't build shared response infrastructure unilaterally because competitors free-ride on the benefits while the builder bears full costs. This is distinct from the competitive pressure argument — it's about why shared infrastructure doesn't get built even when racing isn't the primary concern.
|
||||
|
||||
### Additional Evidence (confirm)
|
||||
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
|
||||
|
||||
RepliBench exists as a comprehensive self-replication evaluation tool but is not integrated into compliance frameworks despite EU AI Act Article 55 taking effect after its publication. Labs can voluntarily use it but face no enforcement mechanism requiring them to do so, creating competitive pressure to avoid evaluations that might reveal concerning capabilities.
|
||||
|
||||
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -48,6 +48,12 @@ The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failu
|
|||
|
||||
NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026).
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2026-03-22-automation-bias-rct-ai-trained-physicians]] | Added: 2026-03-23*
|
||||
|
||||
RCT evidence (NCT06963957, medRxiv August 2025) shows automation bias persists even after 20 hours of AI-literacy training specifically designed to teach critical evaluation of AI output. Physicians with this training still voluntarily deferred to deliberately erroneous LLM recommendations in 3 of 6 clinical vignettes, demonstrating that the human-in-the-loop degradation mechanism operates even when humans are extensively trained to resist it.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -92,6 +92,8 @@ The futarchy governance protocol on Solana. Implements decision markets through
|
|||
- **2026-02-07** — First failed ICO: Hurupay raised $2M against $3M minimum, all capital refunded under unruggable ICO mechanics
|
||||
- **2026-03-26** — [[metadao-p2p-me-ico]] Active: P2P.me ICO launched targeting $6M at $15.5M FDV, backed by Multicoin Capital and Coinbase Ventures (closes March 30)
|
||||
- **2025-Q4** — Reached first operating profitability with $2.51M in fee revenue from Futarchy AMM and Meteora pools; expanded futarchy ecosystem from 2 to 8 protocols; total futarchy market cap reached $219M with non-META market cap of $69M; hosted 6 ICOs in quarter raising $18.7M; maintains 15+ quarters of runway
|
||||
- **2026-03-21** — [[metadao-meta036-hanson-futarchy-research]] Active: Proposal to fund $80K academic research at GMU led by Robin Hanson, trading at 50% likelihood
|
||||
- **2025-Q4** — Achieved first operating profitability with $2.51M in fee revenue from Futarchy AMM and Meteora pools; hosted 6 ICOs in quarter raising $18.7M; expanded futarchy ecosystem from 2 to 8 protocols; total equity grew from $4M to $16.5M
|
||||
## Key Decisions
|
||||
| Date | Proposal | Proposer | Category | Outcome |
|
||||
|------|----------|----------|----------|---------|
|
||||
|
|
|
|||
|
|
@ -57,4 +57,5 @@ Treasury controlled by token holders through futarchy-based governance. Team can
|
|||
- **2026-03-26** — [[p2p-me-metadao-ico]] Active: ICO scheduled, targeting $6M raise at $15.5M FDV with Pine Analytics identifying 182x gross profit multiple concerns
|
||||
- **2026-03-26** — [[p2p-me-ico-march-2026]] Active: $6M ICO at $15.5M FDV scheduled on MetaDAO
|
||||
- **2026-03-26** — [[metadao-p2p-me-ico]] Active: ICO launch targeting $15.5M FDV at 182x gross profit multiple
|
||||
- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Active: ICO scheduled, targeting $6M at $15.5M FDV
|
||||
- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Active: ICO scheduled, targeting $6M at $15.5M FDV
|
||||
- **2026-03-26** — [[p2p-me-metadao-ico-march-2026]] Status pending: ICO vote scheduled
|
||||
|
|
@ -7,7 +7,7 @@ date: 2026-03-00
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: paper
|
||||
status: enrichment
|
||||
status: processed
|
||||
priority: high
|
||||
tags: [coordination-gap, institutional-readiness, frontier-AI-safety, precommitment, incident-response, coordination-failure, nuclear-analogies, pandemic-preparedness, B2-confirms]
|
||||
processed_by: theseus
|
||||
|
|
@ -7,7 +7,7 @@ date: 2025-04-21
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: paper
|
||||
status: unprocessed
|
||||
status: processed
|
||||
priority: high
|
||||
tags: [self-replication, autonomous-replication, capability-evaluation, AISI, RepliBench, loss-of-control, EU-AI-Act, benchmark]
|
||||
---
|
||||
|
|
@ -7,7 +7,7 @@ date: 2026-01-01
|
|||
domain: health
|
||||
secondary_domains: [ai-alignment]
|
||||
format: regulatory document
|
||||
status: null-result
|
||||
status: processed
|
||||
priority: high
|
||||
tags: [eu-ai-act, regulatory, clinical-ai-safety, high-risk-ai, healthcare-compliance, transparency, human-oversight, belief-3, belief-5]
|
||||
processed_by: vida
|
||||
|
|
@ -7,7 +7,7 @@ date: 2025-08-26
|
|||
domain: health
|
||||
secondary_domains: [ai-alignment]
|
||||
format: research paper
|
||||
status: unprocessed
|
||||
status: processed
|
||||
priority: high
|
||||
tags: [automation-bias, clinical-ai-safety, physician-rct, llm-diagnostic, centaur-model, ai-literacy, chatgpt, randomized-trial]
|
||||
---
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
}
|
||||
],
|
||||
"validation_stats": {
|
||||
"total": 2,
|
||||
"kept": 0,
|
||||
"fixed": 4,
|
||||
"rejected": 2,
|
||||
"fixes_applied": [
|
||||
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:set_created:2026-03-23",
|
||||
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
|
||||
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:scalable oversight degrades rapidly as capability gaps grow",
|
||||
"self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:set_created:2026-03-23"
|
||||
],
|
||||
"rejections": [
|
||||
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:missing_attribution_extractor",
|
||||
"self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"date": "2026-03-23"
|
||||
}
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
}
|
||||
],
|
||||
"validation_stats": {
|
||||
"total": 1,
|
||||
"kept": 0,
|
||||
"fixed": 3,
|
||||
"rejected": 1,
|
||||
"fixes_applied": [
|
||||
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:set_created:2026-03-23",
|
||||
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon",
|
||||
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin"
|
||||
],
|
||||
"rejections": [
|
||||
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"date": "2026-03-23"
|
||||
}
|
||||
Loading…
Reference in a new issue