teleo-codex/inbox/null-result/2026-03-01-ai-degrades-human-performance-high-stakes.md
Teleo Agents 6459163781 epimetheus: source archive restructure — 537 files reorganized
inbox/queue/ (52 unprocessed) — landing zone for new sources
inbox/archive/{domain}/ (311 processed) — organized by domain
inbox/null-result/ (174) — reviewed, nothing extractable

One-time atomic migration. All paths preserved (wiki links use stems).

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
2026-03-18 11:52:23 +00:00

76 lines
4.7 KiB
Markdown

---
type: source
title: "How AI Can Degrade Human Performance in High-Stakes Settings"
author: "AI Frontiers"
url: https://ai-frontiers.org/articles/how-ai-can-degrade-human-performance-in-high-stakes-settings
date: 2026-03-01
domain: ai-alignment
secondary_domains: [health]
format: essay
status: null-result
priority: high
triage_tag: claim
tags: [human-ai-performance, high-stakes, degradation, nursing, aviation, nuclear, joint-activity-testing]
flagged_for_vida: ["450 nursing students/nurses tested with AI in ICU cases — performance degrades 96-120% when AI predictions mislead"]
processed_by: theseus
processed_date: 2026-03-18
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "LLM returned 3 claims, 3 rejected by validator"
---
## Content
Cross-domain analysis of how AI degrades human performance in critical settings:
**Healthcare (nursing study):**
- 450 nursing students and licensed nurses reviewing ICU cases
- Four AI configurations from no assistance to full predictions + annotations
- Best case: 53-67% BETTER when AI predictions accurate
- Worst case: 96-120% WORSE when AI predictions misleading
- "Nurses did not reliably recognize when AI predictions were right or wrong"
- AI appeared to change HOW nurses think when assessing patients, not just what they decide
**Aviation:**
- AI weather monitoring missed microbursts during landing
- Crews faced doubled workload with halved preparation time
- Required emergency maneuvers
**Nuclear energy:**
- AI warning systems hid underlying problems through filtering
- Misclassified gradual coolant pressure drops as benign
- Led to cascading subsystem failures
**Asymmetric risk profile:**
- Gains from accurate AI: 53-67%
- Losses from inaccurate AI: 96-120%
- "Averaging results can hide rare but severe errors, creating blind spots with potentially catastrophic consequences"
**Conditions worsening degradation:**
1. AI errors are subtle and plausible (not obviously wrong)
2. Humans cannot verify predictions (complexity/information asymmetry)
3. AI aggregates/filters information, hiding important signals
4. Staffing reduced based on false confidence in AI
5. Rare but critical failures that testing didn't anticipate
**Proposed mitigation — Joint Activity Testing (JAT):**
1. Test humans AND AI together, not separately
2. Evaluate diverse AI performance scenarios (excel, struggle, fail)
3. Enable human error recovery over patching
## Agent Notes
**Triage:** [CLAIM] — "AI degrades human decision-making performance asymmetrically — gains from accurate AI (53-67%) are smaller than losses from inaccurate AI (96-120%) — creating a structural risk where average performance masks catastrophic tail outcomes" — multi-domain evidence
**Why this matters:** The ASYMMETRY is the critical finding. Even if AI is right 90% of the time, the 10% where it's wrong produces losses nearly double the gains from the 90%. This is why averaging performance hides the real risk. For alignment: human oversight of AI is not just "sometimes unhelpful" — it's structurally asymmetric, with large downside when oversight fails and modest upside when it succeeds.
**What surprised me:** The COGNITIVE CHANGE mechanism. AI doesn't just provide wrong answers — it changes how humans THINK about problems. This is deeper than automation bias. It's cognitive restructuring. Once you've internalized AI-mediated reasoning, you can't just "turn it off" when AI fails.
**KB connections:** [[human-in-the-loop clinical AI degrades to worse-than-AI-alone]], [[AI capability and reliability are independent dimensions]], [[scalable oversight degrades rapidly as capability gaps grow]]
**Extraction hints:** Three distinct claims: (1) asymmetric risk profile, (2) cognitive restructuring mechanism, (3) JAT as evaluation framework. The asymmetry finding is most novel.
## Curator Notes
PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs
WHY ARCHIVED: Extends our existing clinical AI degradation claim with cross-domain evidence (nursing, aviation, nuclear) and quantifies the asymmetric risk profile. The cognitive restructuring mechanism is a novel finding.
## Key Facts
- 450 nursing students and licensed nurses participated in ICU case review study with four AI configurations
- AI weather monitoring in aviation missed microbursts during landing, doubling crew workload and halving preparation time
- Nuclear energy AI warning systems misclassified gradual coolant pressure drops as benign, leading to cascading subsystem failures
- Study tested four AI configurations: no assistance, predictions only, predictions plus annotations, and full AI support