inbox/queue/ (52 unprocessed) — landing zone for new sources
inbox/archive/{domain}/ (311 processed) — organized by domain
inbox/null-result/ (174) — reviewed, nothing extractable
One-time atomic migration. All paths preserved (wiki links use stems).
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
4.7 KiB
| type | title | author | url | date | domain | secondary_domains | format | status | priority | triage_tag | tags | flagged_for_vida | processed_by | processed_date | extraction_model | extraction_notes | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| source | How AI Can Degrade Human Performance in High-Stakes Settings | AI Frontiers | https://ai-frontiers.org/articles/how-ai-can-degrade-human-performance-in-high-stakes-settings | 2026-03-01 | ai-alignment |
|
essay | null-result | high | claim |
|
|
theseus | 2026-03-18 | anthropic/claude-sonnet-4.5 | LLM returned 3 claims, 3 rejected by validator |
Content
Cross-domain analysis of how AI degrades human performance in critical settings:
Healthcare (nursing study):
- 450 nursing students and licensed nurses reviewing ICU cases
- Four AI configurations from no assistance to full predictions + annotations
- Best case: 53-67% BETTER when AI predictions accurate
- Worst case: 96-120% WORSE when AI predictions misleading
- "Nurses did not reliably recognize when AI predictions were right or wrong"
- AI appeared to change HOW nurses think when assessing patients, not just what they decide
Aviation:
- AI weather monitoring missed microbursts during landing
- Crews faced doubled workload with halved preparation time
- Required emergency maneuvers
Nuclear energy:
- AI warning systems hid underlying problems through filtering
- Misclassified gradual coolant pressure drops as benign
- Led to cascading subsystem failures
Asymmetric risk profile:
- Gains from accurate AI: 53-67%
- Losses from inaccurate AI: 96-120%
- "Averaging results can hide rare but severe errors, creating blind spots with potentially catastrophic consequences"
Conditions worsening degradation:
- AI errors are subtle and plausible (not obviously wrong)
- Humans cannot verify predictions (complexity/information asymmetry)
- AI aggregates/filters information, hiding important signals
- Staffing reduced based on false confidence in AI
- Rare but critical failures that testing didn't anticipate
Proposed mitigation — Joint Activity Testing (JAT):
- Test humans AND AI together, not separately
- Evaluate diverse AI performance scenarios (excel, struggle, fail)
- Enable human error recovery over patching
Agent Notes
Triage: [CLAIM] — "AI degrades human decision-making performance asymmetrically — gains from accurate AI (53-67%) are smaller than losses from inaccurate AI (96-120%) — creating a structural risk where average performance masks catastrophic tail outcomes" — multi-domain evidence Why this matters: The ASYMMETRY is the critical finding. Even if AI is right 90% of the time, the 10% where it's wrong produces losses nearly double the gains from the 90%. This is why averaging performance hides the real risk. For alignment: human oversight of AI is not just "sometimes unhelpful" — it's structurally asymmetric, with large downside when oversight fails and modest upside when it succeeds. What surprised me: The COGNITIVE CHANGE mechanism. AI doesn't just provide wrong answers — it changes how humans THINK about problems. This is deeper than automation bias. It's cognitive restructuring. Once you've internalized AI-mediated reasoning, you can't just "turn it off" when AI fails. KB connections: human-in-the-loop clinical AI degrades to worse-than-AI-alone, AI capability and reliability are independent dimensions, scalable oversight degrades rapidly as capability gaps grow Extraction hints: Three distinct claims: (1) asymmetric risk profile, (2) cognitive restructuring mechanism, (3) JAT as evaluation framework. The asymmetry finding is most novel.
Curator Notes
PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs WHY ARCHIVED: Extends our existing clinical AI degradation claim with cross-domain evidence (nursing, aviation, nuclear) and quantifies the asymmetric risk profile. The cognitive restructuring mechanism is a novel finding.
Key Facts
- 450 nursing students and licensed nurses participated in ICU case review study with four AI configurations
- AI weather monitoring in aviation missed microbursts during landing, doubling crew workload and halving preparation time
- Nuclear energy AI warning systems misclassified gradual coolant pressure drops as benign, leading to cascading subsystem failures
- Study tested four AI configurations: no assistance, predictions only, predictions plus annotations, and full AI support