--- type: source title: "When combinations of humans and AI are useful: A systematic review and meta-analysis" author: "Michelle Vaccaro, Abdullah Almaatouq, Thomas Malone (@NatureHumBehav)" url: https://www.nature.com/articles/s41562-024-02024-1 date: 2024-12-01 domain: ai-alignment secondary_domains: [collective-intelligence] format: paper status: null-result priority: high triage_tag: claim tags: [human-ai-teams, meta-analysis, decision-making, content-creation, oversight, performance] processed_by: theseus processed_date: 2026-03-18 extraction_model: "anthropic/claude-sonnet-4.5" extraction_notes: "LLM returned 2 claims, 2 rejected by validator" --- ## Content Systematic review and meta-analysis of 106 experimental studies reporting 370 effect sizes. Published in Nature Human Behaviour, December 2024. Searched interdisciplinary databases for studies published between January 2020 and June 2023. **Main finding:** On average, human-AI combinations performed significantly worse than the best of humans or AI alone (Hedges' g = -0.23; 95% CI: -0.39 to -0.07). **Task-type moderation:** - Performance LOSSES in tasks involving decision-making (deepfake classification, demand forecasting, medical diagnosis) - Performance GAINS in tasks involving content creation (summarizing social media, chatbot responses, generating new content) **Relative performance moderation:** - When humans outperformed AI alone → performance gains in combination - When AI outperformed humans alone → performance losses in combination - Human-AI teams performed better than humans alone but failed to surpass AI working independently **Implication:** Human-AI teams do not achieve "synergy" — they underperform compared to the best individual performer in each category. The combination is worse than the better of the two components. ## Agent Notes **Triage:** [CLAIM] — "human-AI teams perform worse than the best of humans or AI alone on average, with the deficit concentrated in decision-making tasks" — this is a specific, disagreeable, empirically grounded claim from the strongest possible evidence type (meta-analysis, 370 effect sizes) **Why this matters:** Directly challenges the assumption underlying human-in-the-loop alignment: that combining human judgment with AI produces better outcomes. If human oversight DEGRADES decision quality when AI is better, the case for human-in-the-loop as an alignment mechanism weakens dramatically. This also complicates our KB claim about centaur team performance. **What surprised me:** The DIRECTION-DEPENDENT finding. Humans help when they're better, hurt when AI is better. This is the automation overshoot mechanism — as AI improves, the case for human involvement weakens in domains where AI exceeds human capability, but economic/safety arguments still push for human oversight. **KB connections:** [[centaur team performance depends on role complementarity not mere human-AI combination]], [[human-in-the-loop clinical AI degrades to worse-than-AI-alone]], [[economic forces push humans out of every cognitive loop where output quality is independently verifiable]] **Extraction hints:** The task-type moderation is the key insight. Decision-making vs content creation distinction may map to verifiable vs subjective outputs. ## Curator Notes PRIMARY CONNECTION: centaur team performance depends on role complementarity not mere human-AI combination WHY ARCHIVED: This is the strongest empirical evidence (370 effect sizes, Nature HB) that human-AI combination is NOT automatically beneficial — it depends on relative capability and task type. Directly relevant to the automation overshoot question. ## Key Facts - Meta-analysis covered 106 experimental studies published between January 2020 and June 2023 - 370 effect sizes were analyzed across the studies - Overall effect size: Hedges' g = -0.23 (95% CI: -0.39 to -0.07) - Published in Nature Human Behaviour, December 2024 - Task types showing losses: deepfake classification, demand forecasting, medical diagnosis - Task types showing gains: summarizing social media, chatbot responses, generating new content