teleo-codex/inbox/null-result/2026-03-01-ai-degrades-human-performance-high-stakes.md at 2f0f00df2c93cf7d5da7e35f8ed2069085ec0069

Teleo Agents 6459163781 epimetheus: source archive restructure — 537 files reorganized

inbox/queue/ (52 unprocessed) — landing zone for new sources
inbox/archive/{domain}/ (311 processed) — organized by domain
inbox/null-result/ (174) — reviewed, nothing extractable

One-time atomic migration. All paths preserved (wiki links use stems).

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>

2026-03-18 11:52:23 +00:00

4.7 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

triage_tag

Content

Cross-domain analysis of how AI degrades human performance in critical settings:

Healthcare (nursing study):

450 nursing students and licensed nurses reviewing ICU cases
Four AI configurations from no assistance to full predictions + annotations
Best case: 53-67% BETTER when AI predictions accurate
Worst case: 96-120% WORSE when AI predictions misleading
"Nurses did not reliably recognize when AI predictions were right or wrong"
AI appeared to change HOW nurses think when assessing patients, not just what they decide

Aviation:

AI weather monitoring missed microbursts during landing
Crews faced doubled workload with halved preparation time
Required emergency maneuvers

Nuclear energy:

AI warning systems hid underlying problems through filtering
Misclassified gradual coolant pressure drops as benign
Led to cascading subsystem failures

Asymmetric risk profile:

Gains from accurate AI: 53-67%
Losses from inaccurate AI: 96-120%
"Averaging results can hide rare but severe errors, creating blind spots with potentially catastrophic consequences"

Conditions worsening degradation:

AI errors are subtle and plausible (not obviously wrong)
Humans cannot verify predictions (complexity/information asymmetry)
AI aggregates/filters information, hiding important signals
Staffing reduced based on false confidence in AI
Rare but critical failures that testing didn't anticipate

Proposed mitigation — Joint Activity Testing (JAT):

Test humans AND AI together, not separately
Evaluate diverse AI performance scenarios (excel, struggle, fail)
Enable human error recovery over patching

Agent Notes

Triage: [CLAIM] — "AI degrades human decision-making performance asymmetrically — gains from accurate AI (53-67%) are smaller than losses from inaccurate AI (96-120%) — creating a structural risk where average performance masks catastrophic tail outcomes" — multi-domain evidence Why this matters: The ASYMMETRY is the critical finding. Even if AI is right 90% of the time, the 10% where it's wrong produces losses nearly double the gains from the 90%. This is why averaging performance hides the real risk. For alignment: human oversight of AI is not just "sometimes unhelpful" — it's structurally asymmetric, with large downside when oversight fails and modest upside when it succeeds. What surprised me: The COGNITIVE CHANGE mechanism. AI doesn't just provide wrong answers — it changes how humans THINK about problems. This is deeper than automation bias. It's cognitive restructuring. Once you've internalized AI-mediated reasoning, you can't just "turn it off" when AI fails. KB connections: human-in-the-loop clinical AI degrades to worse-than-AI-alone, AI capability and reliability are independent dimensions, scalable oversight degrades rapidly as capability gaps grow Extraction hints: Three distinct claims: (1) asymmetric risk profile, (2) cognitive restructuring mechanism, (3) JAT as evaluation framework. The asymmetry finding is most novel.

Curator Notes

PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs WHY ARCHIVED: Extends our existing clinical AI degradation claim with cross-domain evidence (nursing, aviation, nuclear) and quantifies the asymmetric risk profile. The cognitive restructuring mechanism is a novel finding.

Key Facts

450 nursing students and licensed nurses participated in ICU case review study with four AI configurations
AI weather monitoring in aviation missed microbursts during landing, doubling crew workload and halving preparation time
Nuclear energy AI warning systems misclassified gradual coolant pressure drops as benign, leading to cascading subsystem failures
Study tested four AI configurations: no assistance, predictions only, predictions plus annotations, and full AI support

4.7 KiB Raw Blame History

Content

Agent Notes

Curator Notes

Key Facts

4.7 KiB

Raw Blame History