--- type: source title: "How AI Can Degrade Human Performance in High-Stakes Settings" author: "AI Frontiers" url: https://ai-frontiers.org/articles/how-ai-can-degrade-human-performance-in-high-stakes-settings date: 2026-03-01 domain: ai-alignment secondary_domains: [health] format: essay status: null-result priority: high triage_tag: claim tags: [human-ai-performance, high-stakes, degradation, nursing, aviation, nuclear, joint-activity-testing] flagged_for_vida: ["450 nursing students/nurses tested with AI in ICU cases — performance degrades 96-120% when AI predictions mislead"] processed_by: theseus processed_date: 2026-03-18 extraction_model: "anthropic/claude-sonnet-4.5" extraction_notes: "LLM returned 3 claims, 3 rejected by validator" --- ## Content Cross-domain analysis of how AI degrades human performance in critical settings: **Healthcare (nursing study):** - 450 nursing students and licensed nurses reviewing ICU cases - Four AI configurations from no assistance to full predictions + annotations - Best case: 53-67% BETTER when AI predictions accurate - Worst case: 96-120% WORSE when AI predictions misleading - "Nurses did not reliably recognize when AI predictions were right or wrong" - AI appeared to change HOW nurses think when assessing patients, not just what they decide **Aviation:** - AI weather monitoring missed microbursts during landing - Crews faced doubled workload with halved preparation time - Required emergency maneuvers **Nuclear energy:** - AI warning systems hid underlying problems through filtering - Misclassified gradual coolant pressure drops as benign - Led to cascading subsystem failures **Asymmetric risk profile:** - Gains from accurate AI: 53-67% - Losses from inaccurate AI: 96-120% - "Averaging results can hide rare but severe errors, creating blind spots with potentially catastrophic consequences" **Conditions worsening degradation:** 1. AI errors are subtle and plausible (not obviously wrong) 2. Humans cannot verify predictions (complexity/information asymmetry) 3. AI aggregates/filters information, hiding important signals 4. Staffing reduced based on false confidence in AI 5. Rare but critical failures that testing didn't anticipate **Proposed mitigation — Joint Activity Testing (JAT):** 1. Test humans AND AI together, not separately 2. Evaluate diverse AI performance scenarios (excel, struggle, fail) 3. Enable human error recovery over patching ## Agent Notes **Triage:** [CLAIM] — "AI degrades human decision-making performance asymmetrically — gains from accurate AI (53-67%) are smaller than losses from inaccurate AI (96-120%) — creating a structural risk where average performance masks catastrophic tail outcomes" — multi-domain evidence **Why this matters:** The ASYMMETRY is the critical finding. Even if AI is right 90% of the time, the 10% where it's wrong produces losses nearly double the gains from the 90%. This is why averaging performance hides the real risk. For alignment: human oversight of AI is not just "sometimes unhelpful" — it's structurally asymmetric, with large downside when oversight fails and modest upside when it succeeds. **What surprised me:** The COGNITIVE CHANGE mechanism. AI doesn't just provide wrong answers — it changes how humans THINK about problems. This is deeper than automation bias. It's cognitive restructuring. Once you've internalized AI-mediated reasoning, you can't just "turn it off" when AI fails. **KB connections:** [[human-in-the-loop clinical AI degrades to worse-than-AI-alone]], [[AI capability and reliability are independent dimensions]], [[scalable oversight degrades rapidly as capability gaps grow]] **Extraction hints:** Three distinct claims: (1) asymmetric risk profile, (2) cognitive restructuring mechanism, (3) JAT as evaluation framework. The asymmetry finding is most novel. ## Curator Notes PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs WHY ARCHIVED: Extends our existing clinical AI degradation claim with cross-domain evidence (nursing, aviation, nuclear) and quantifies the asymmetric risk profile. The cognitive restructuring mechanism is a novel finding. ## Key Facts - 450 nursing students and licensed nurses participated in ICU case review study with four AI configurations - AI weather monitoring in aviation missed microbursts during landing, doubling crew workload and halving preparation time - Nuclear energy AI warning systems misclassified gradual coolant pressure drops as benign, leading to cascading subsystem failures - Study tested four AI configurations: no assistance, predictions only, predictions plus annotations, and full AI support