inbox/queue/ (52 unprocessed) — landing zone for new sources
inbox/archive/{domain}/ (311 processed) — organized by domain
inbox/null-result/ (174) — reviewed, nothing extractable
One-time atomic migration. All paths preserved (wiki links use stems).
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
76 lines
4.7 KiB
Markdown
76 lines
4.7 KiB
Markdown
---
|
|
type: source
|
|
title: "How AI Can Degrade Human Performance in High-Stakes Settings"
|
|
author: "AI Frontiers"
|
|
url: https://ai-frontiers.org/articles/how-ai-can-degrade-human-performance-in-high-stakes-settings
|
|
date: 2026-03-01
|
|
domain: ai-alignment
|
|
secondary_domains: [health]
|
|
format: essay
|
|
status: null-result
|
|
priority: high
|
|
triage_tag: claim
|
|
tags: [human-ai-performance, high-stakes, degradation, nursing, aviation, nuclear, joint-activity-testing]
|
|
flagged_for_vida: ["450 nursing students/nurses tested with AI in ICU cases — performance degrades 96-120% when AI predictions mislead"]
|
|
processed_by: theseus
|
|
processed_date: 2026-03-18
|
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
|
extraction_notes: "LLM returned 3 claims, 3 rejected by validator"
|
|
---
|
|
|
|
## Content
|
|
|
|
Cross-domain analysis of how AI degrades human performance in critical settings:
|
|
|
|
**Healthcare (nursing study):**
|
|
- 450 nursing students and licensed nurses reviewing ICU cases
|
|
- Four AI configurations from no assistance to full predictions + annotations
|
|
- Best case: 53-67% BETTER when AI predictions accurate
|
|
- Worst case: 96-120% WORSE when AI predictions misleading
|
|
- "Nurses did not reliably recognize when AI predictions were right or wrong"
|
|
- AI appeared to change HOW nurses think when assessing patients, not just what they decide
|
|
|
|
**Aviation:**
|
|
- AI weather monitoring missed microbursts during landing
|
|
- Crews faced doubled workload with halved preparation time
|
|
- Required emergency maneuvers
|
|
|
|
**Nuclear energy:**
|
|
- AI warning systems hid underlying problems through filtering
|
|
- Misclassified gradual coolant pressure drops as benign
|
|
- Led to cascading subsystem failures
|
|
|
|
**Asymmetric risk profile:**
|
|
- Gains from accurate AI: 53-67%
|
|
- Losses from inaccurate AI: 96-120%
|
|
- "Averaging results can hide rare but severe errors, creating blind spots with potentially catastrophic consequences"
|
|
|
|
**Conditions worsening degradation:**
|
|
1. AI errors are subtle and plausible (not obviously wrong)
|
|
2. Humans cannot verify predictions (complexity/information asymmetry)
|
|
3. AI aggregates/filters information, hiding important signals
|
|
4. Staffing reduced based on false confidence in AI
|
|
5. Rare but critical failures that testing didn't anticipate
|
|
|
|
**Proposed mitigation — Joint Activity Testing (JAT):**
|
|
1. Test humans AND AI together, not separately
|
|
2. Evaluate diverse AI performance scenarios (excel, struggle, fail)
|
|
3. Enable human error recovery over patching
|
|
|
|
## Agent Notes
|
|
**Triage:** [CLAIM] — "AI degrades human decision-making performance asymmetrically — gains from accurate AI (53-67%) are smaller than losses from inaccurate AI (96-120%) — creating a structural risk where average performance masks catastrophic tail outcomes" — multi-domain evidence
|
|
**Why this matters:** The ASYMMETRY is the critical finding. Even if AI is right 90% of the time, the 10% where it's wrong produces losses nearly double the gains from the 90%. This is why averaging performance hides the real risk. For alignment: human oversight of AI is not just "sometimes unhelpful" — it's structurally asymmetric, with large downside when oversight fails and modest upside when it succeeds.
|
|
**What surprised me:** The COGNITIVE CHANGE mechanism. AI doesn't just provide wrong answers — it changes how humans THINK about problems. This is deeper than automation bias. It's cognitive restructuring. Once you've internalized AI-mediated reasoning, you can't just "turn it off" when AI fails.
|
|
**KB connections:** [[human-in-the-loop clinical AI degrades to worse-than-AI-alone]], [[AI capability and reliability are independent dimensions]], [[scalable oversight degrades rapidly as capability gaps grow]]
|
|
**Extraction hints:** Three distinct claims: (1) asymmetric risk profile, (2) cognitive restructuring mechanism, (3) JAT as evaluation framework. The asymmetry finding is most novel.
|
|
|
|
## Curator Notes
|
|
PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs
|
|
WHY ARCHIVED: Extends our existing clinical AI degradation claim with cross-domain evidence (nursing, aviation, nuclear) and quantifies the asymmetric risk profile. The cognitive restructuring mechanism is a novel finding.
|
|
|
|
|
|
## Key Facts
|
|
- 450 nursing students and licensed nurses participated in ICU case review study with four AI configurations
|
|
- AI weather monitoring in aviation missed microbursts during landing, doubling crew workload and halving preparation time
|
|
- Nuclear energy AI warning systems misclassified gradual coolant pressure drops as benign, leading to cascading subsystem failures
|
|
- Study tested four AI configurations: no assistance, predictions only, predictions plus annotations, and full AI support
|