extract: 2025-07-10-metr-ai-developer-productivity-rct #1206
12 changed files with 182 additions and 3 deletions
|
|
@ -51,6 +51,12 @@ Sora standalone app achieved 12 million downloads but retention below 8% at day
|
||||||
|
|
||||||
EU AI Act Article 50 (effective August 2026) creates a creative content exemption that means entertainment's authenticity premium will be market-driven rather than regulation-driven. While AI-generated news/marketing must be labeled, 'evidently artistic, creative, satirical, or fictional' content requires only minimal disclosure. This regulatory asymmetry confirms that consumer preference, not regulatory mandate, remains the binding constraint for AI adoption in entertainment.
|
EU AI Act Article 50 (effective August 2026) creates a creative content exemption that means entertainment's authenticity premium will be market-driven rather than regulation-driven. While AI-generated news/marketing must be labeled, 'evidently artistic, creative, satirical, or fictional' content requires only minimal disclosure. This regulatory asymmetry confirms that consumer preference, not regulatory mandate, remains the binding constraint for AI adoption in entertainment.
|
||||||
|
|
||||||
|
|
||||||
|
### Additional Evidence (confirm)
|
||||||
|
*Source: [[2025-06-18-arxiv-fanfiction-age-of-ai]] | Added: 2026-03-18*
|
||||||
|
|
||||||
|
Academic survey of fanfiction communities shows 66% would decrease interest in reading AI-generated stories, 43% actively oppose AI integration, and 72% report negative reaction to discovering undisclosed AI usage. 84.7% believe AI cannot replicate emotional nuances. These are overwhelming rejection rates that persist despite AI quality improvements.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -37,6 +37,12 @@ This advantage compounds with the scarcity economics documented in the media att
|
||||||
- **Human-made premium unquantified**: The underlying premium itself is still emerging and not yet measured
|
- **Human-made premium unquantified**: The underlying premium itself is still emerging and not yet measured
|
||||||
- **Selection bias risk**: Communities may form preferentially around human-created content for reasons other than provenance (quality, cultural resonance), confounding causality
|
- **Selection bias risk**: Communities may form preferentially around human-created content for reasons other than provenance (quality, cultural resonance), confounding causality
|
||||||
|
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2025-06-18-arxiv-fanfiction-age-of-ai]] | Added: 2026-03-18*
|
||||||
|
|
||||||
|
Fanfiction communities demonstrate that provenance verification is not just about authenticity but about community participation: members evaluate through 'evidence of author engagement with source material' and value the craft-development journey. 68.6% expressed ethical concerns about unauthorized scraping of fan works for AI training, viewing it as appropriation of unpaid creative labor within gift-economy communities. This extends the provenance advantage: community-owned IP has both inherent provenance AND community investment in protecting that provenance.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -31,6 +31,12 @@ The 2026 emergence of 'human-made' as a premium market label provides concrete e
|
||||||
|
|
||||||
The 2026 benchmark shows AI video quality (hand anatomy, lip-sync) has crossed the threshold where technical tells are no longer visible, yet consumer adoption remains low (Sora <8% D30 retention). This suggests that once quality becomes indistinguishable, the preference signal shifts to factors other than production value — likely authenticity, provenance, or use case fit rather than visual fidelity.
|
The 2026 benchmark shows AI video quality (hand anatomy, lip-sync) has crossed the threshold where technical tells are no longer visible, yet consumer adoption remains low (Sora <8% D30 retention). This suggests that once quality becomes indistinguishable, the preference signal shifts to factors other than production value — likely authenticity, provenance, or use case fit rather than visual fidelity.
|
||||||
|
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2025-06-18-arxiv-fanfiction-age-of-ai]] | Added: 2026-03-18*
|
||||||
|
|
||||||
|
Fanfiction communities reveal that quality is not just fluid but RELATIONAL: embedded in community values and social context. Members evaluate through emotional depth, character consistency, and evidence of author engagement—criteria that are inherently social. A technically competent AI story may be deemed 'low quality' if it lacks authentic voice. This means quality definitions can be structurally incompatible with AI-generated content regardless of technical capability.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -35,6 +35,12 @@ The data is specific to creator content and may not generalize to all entertainm
|
||||||
|
|
||||||
Deloitte 2024 Connected Consumer Survey found nearly 70% of respondents are concerned AI-generated content will be used to deceive them. Approximately half of consumers now believe they can recognize AI-written content, with many disengaging when brands appear to rely heavily on it in emotionally meaningful contexts.
|
Deloitte 2024 Connected Consumer Survey found nearly 70% of respondents are concerned AI-generated content will be used to deceive them. Approximately half of consumers now believe they can recognize AI-written content, with many disengaging when brands appear to rely heavily on it in emotionally meaningful contexts.
|
||||||
|
|
||||||
|
|
||||||
|
### Additional Evidence (confirm)
|
||||||
|
*Source: [[2025-06-18-arxiv-fanfiction-age-of-ai]] | Added: 2026-03-18*
|
||||||
|
|
||||||
|
Fanfiction community data shows rejection is VALUES-based not quality-based: 92% agree 'fanfiction is a space for human creativity' and 86% insist on AI disclosure. 58% feel 'deceived' by undisclosed AI usage. The authenticity signal (human authorship) is the primary quality criterion, making technical improvements irrelevant to acceptance.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -23,6 +23,12 @@ The near-term trajectory: mandatory outpatient screening by 2026, Z-code adoptio
|
||||||
|
|
||||||
The Commonwealth Fund's 2024 international comparison provides quantified evidence of the population-level cost of not operationalizing SDOH interventions at scale. The US ranks second-worst on equity (9th of 10 countries) and last on health outcomes (10th of 10), with the highest healthcare spending (>16% of GDP). This outcome gap relative to peer nations with lower spending demonstrates the opportunity cost of the US healthcare system's failure to systematically address social determinants. Countries with better equity and access outcomes (Australia, Netherlands) achieve superior population health despite similar or lower clinical quality and lower spending ratios. The international comparison quantifies what the SDOH adoption gap costs: the US achieves worst population health outcomes among wealthy peer nations despite world-class clinical care, suggesting that the 3% Z-code documentation rate represents billions in foregone health gains.
|
The Commonwealth Fund's 2024 international comparison provides quantified evidence of the population-level cost of not operationalizing SDOH interventions at scale. The US ranks second-worst on equity (9th of 10 countries) and last on health outcomes (10th of 10), with the highest healthcare spending (>16% of GDP). This outcome gap relative to peer nations with lower spending demonstrates the opportunity cost of the US healthcare system's failure to systematically address social determinants. Countries with better equity and access outcomes (Australia, Netherlands) achieve superior population health despite similar or lower clinical quality and lower spending ratios. The international comparison quantifies what the SDOH adoption gap costs: the US achieves worst population health outcomes among wealthy peer nations despite world-class clinical care, suggesting that the 3% Z-code documentation rate represents billions in foregone health gains.
|
||||||
|
|
||||||
|
|
||||||
|
### Additional Evidence (challenge)
|
||||||
|
*Source: [[2025-04-07-tufts-health-affairs-medically-tailored-meals-50-states]] | Added: 2026-03-18*
|
||||||
|
|
||||||
|
The JAMA Internal Medicine 2024 RCT testing intensive food-as-medicine intervention (10 meals/week + education + coaching for 1 year) found NO significant difference in HbA1c, hospitalization, ED use, or total claims between treatment and control groups. This challenges the assumption that SDOH interventions produce strong ROI—the RCT evidence shows null clinical outcomes despite addressing food insecurity directly.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -47,6 +47,12 @@ The NHS paradox—ranking 3rd overall while having catastrophic specialty access
|
||||||
|
|
||||||
WHO's three-pillar framework for GLP-1 obesity treatment explicitly positions medication as one component within a comprehensive approach requiring healthy diets, physical activity, professional support, and population-level policies. WHO states obesity is a 'societal challenge requiring multisectoral action — not just individual medical treatment.' This institutional positioning from the global health authority confirms that pharmaceutical intervention alone cannot address health outcomes driven by behavioral and social factors.
|
WHO's three-pillar framework for GLP-1 obesity treatment explicitly positions medication as one component within a comprehensive approach requiring healthy diets, physical activity, professional support, and population-level policies. WHO states obesity is a 'societal challenge requiring multisectoral action — not just individual medical treatment.' This institutional positioning from the global health authority confirms that pharmaceutical intervention alone cannot address health outcomes driven by behavioral and social factors.
|
||||||
|
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2025-04-07-tufts-health-affairs-medically-tailored-meals-50-states]] | Added: 2026-03-18*
|
||||||
|
|
||||||
|
While social determinants predict health outcomes in observational studies, RCT evidence from food-as-medicine interventions shows that directly addressing social determinants (food insecurity) does not automatically improve clinical outcomes. The AHA 2025 systematic review of 14 US RCTs found Food Is Medicine programs improve diet quality and food security but "impact on clinical outcomes was inconsistent and often failed to reach statistical significance." This suggests the causal pathway from social determinants to health is more complex than simple resource provision.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,26 @@
|
||||||
|
{
|
||||||
|
"rejected_claims": [
|
||||||
|
{
|
||||||
|
"filename": "food-as-medicine-simulation-models-project-massive-savings-but-rcts-show-null-clinical-outcomes-exposing-causal-inference-gap.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor",
|
||||||
|
"opsec_internal_deal_terms"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"validation_stats": {
|
||||||
|
"total": 1,
|
||||||
|
"kept": 0,
|
||||||
|
"fixed": 1,
|
||||||
|
"rejected": 1,
|
||||||
|
"fixes_applied": [
|
||||||
|
"food-as-medicine-simulation-models-project-massive-savings-but-rcts-show-null-clinical-outcomes-exposing-causal-inference-gap.md:set_created:2026-03-18"
|
||||||
|
],
|
||||||
|
"rejections": [
|
||||||
|
"food-as-medicine-simulation-models-project-massive-savings-but-rcts-show-null-clinical-outcomes-exposing-causal-inference-gap.md:missing_attribution_extractor",
|
||||||
|
"food-as-medicine-simulation-models-project-massive-savings-but-rcts-show-null-clinical-outcomes-exposing-causal-inference-gap.md:opsec_internal_deal_terms"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "anthropic/claude-sonnet-4.5",
|
||||||
|
"date": "2026-03-18"
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,33 @@
|
||||||
|
{
|
||||||
|
"rejected_claims": [
|
||||||
|
{
|
||||||
|
"filename": "quality-assessment-in-community-fiction-is-relational-not-absolute-creating-structural-advantage-for-human-content.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"filename": "creator-stakeholding-intensifies-ai-resistance-because-craft-development-journey-is-the-value-not-output-efficiency.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"validation_stats": {
|
||||||
|
"total": 2,
|
||||||
|
"kept": 0,
|
||||||
|
"fixed": 3,
|
||||||
|
"rejected": 2,
|
||||||
|
"fixes_applied": [
|
||||||
|
"quality-assessment-in-community-fiction-is-relational-not-absolute-creating-structural-advantage-for-human-content.md:set_created:2026-03-18",
|
||||||
|
"creator-stakeholding-intensifies-ai-resistance-because-craft-development-journey-is-the-value-not-output-efficiency.md:set_created:2026-03-18",
|
||||||
|
"creator-stakeholding-intensifies-ai-resistance-because-craft-development-journey-is-the-value-not-output-efficiency.md:stripped_wiki_link:the-fanchise-engagement-ladder-from-content-to-co-ownership-"
|
||||||
|
],
|
||||||
|
"rejections": [
|
||||||
|
"quality-assessment-in-community-fiction-is-relational-not-absolute-creating-structural-advantage-for-human-content.md:missing_attribution_extractor",
|
||||||
|
"creator-stakeholding-intensifies-ai-resistance-because-craft-development-journey-is-the-value-not-output-efficiency.md:missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "anthropic/claude-sonnet-4.5",
|
||||||
|
"date": "2026-03-18"
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,37 @@
|
||||||
|
{
|
||||||
|
"rejected_claims": [
|
||||||
|
{
|
||||||
|
"filename": "experienced-developers-slower-with-ai-tools-while-believing-faster-revealing-systematic-perception-gap.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"filename": "practitioner-self-reports-systematically-overestimate-ai-productivity-creating-adoption-signal-distortion.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"validation_stats": {
|
||||||
|
"total": 2,
|
||||||
|
"kept": 0,
|
||||||
|
"fixed": 7,
|
||||||
|
"rejected": 2,
|
||||||
|
"fixes_applied": [
|
||||||
|
"experienced-developers-slower-with-ai-tools-while-believing-faster-revealing-systematic-perception-gap.md:set_created:2026-03-18",
|
||||||
|
"experienced-developers-slower-with-ai-tools-while-believing-faster-revealing-systematic-perception-gap.md:stripped_wiki_link:deep-technical-expertise-is-a-greater-force-multiplier-when-",
|
||||||
|
"experienced-developers-slower-with-ai-tools-while-believing-faster-revealing-systematic-perception-gap.md:stripped_wiki_link:agent-generated-code-creates-cognitive-debt",
|
||||||
|
"experienced-developers-slower-with-ai-tools-while-believing-faster-revealing-systematic-perception-gap.md:stripped_wiki_link:AI-capability-and-reliability-are-independent-dimensions",
|
||||||
|
"practitioner-self-reports-systematically-overestimate-ai-productivity-creating-adoption-signal-distortion.md:set_created:2026-03-18",
|
||||||
|
"practitioner-self-reports-systematically-overestimate-ai-productivity-creating-adoption-signal-distortion.md:stripped_wiki_link:AI-optimization-of-industry-subsystems-induces-demand-for-mo",
|
||||||
|
"practitioner-self-reports-systematically-overestimate-ai-productivity-creating-adoption-signal-distortion.md:stripped_wiki_link:economic-forces-push-humans-out-of-every-cognitive-loop-wher"
|
||||||
|
],
|
||||||
|
"rejections": [
|
||||||
|
"experienced-developers-slower-with-ai-tools-while-believing-faster-revealing-systematic-perception-gap.md:missing_attribution_extractor",
|
||||||
|
"practitioner-self-reports-systematically-overestimate-ai-productivity-creating-adoption-signal-distortion.md:missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "anthropic/claude-sonnet-4.5",
|
||||||
|
"date": "2026-03-18"
|
||||||
|
}
|
||||||
|
|
@ -7,10 +7,14 @@ date: 2025-04-07
|
||||||
domain: health
|
domain: health
|
||||||
secondary_domains: []
|
secondary_domains: []
|
||||||
format: paper
|
format: paper
|
||||||
status: unprocessed
|
status: enrichment
|
||||||
priority: high
|
priority: high
|
||||||
triage_tag: claim
|
triage_tag: claim
|
||||||
tags: [food-as-medicine, medically-tailored-meals, cost-effectiveness, SDOH, behavioral-health-infrastructure]
|
tags: [food-as-medicine, medically-tailored-meals, cost-effectiveness, SDOH, behavioral-health-infrastructure]
|
||||||
|
processed_by: vida
|
||||||
|
processed_date: 2026-03-18
|
||||||
|
enrichments_applied: ["SDOH interventions show strong ROI but adoption stalls because Z-code documentation remains below 3 percent and no operational infrastructure connects screening to action.md", "medical care explains only 10-20 percent of health outcomes because behavioral social and genetic factors dominate as four independent methodologies confirm.md"]
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
@ -58,3 +62,12 @@ Geisinger Fresh Food Farmacy (pilot, n=37):
|
||||||
## Curator Notes
|
## Curator Notes
|
||||||
PRIMARY CONNECTION: SDOH interventions show strong ROI but adoption stalls because Z-code documentation remains below 3 percent and no operational infrastructure connects screening to action
|
PRIMARY CONNECTION: SDOH interventions show strong ROI but adoption stalls because Z-code documentation remains below 3 percent and no operational infrastructure connects screening to action
|
||||||
WHY ARCHIVED: The simulation-vs-RCT tension is the most important finding of this session. It challenges the assumption that addressing social determinants automatically improves health — the causal pathway may be more complex than "fix the determinant, fix the outcome."
|
WHY ARCHIVED: The simulation-vs-RCT tension is the most important finding of this session. It challenges the assumption that addressing social determinants automatically improves health — the causal pathway may be more complex than "fix the determinant, fix the outcome."
|
||||||
|
|
||||||
|
|
||||||
|
## Key Facts
|
||||||
|
- Tufts simulation model projects 10.8M hospitalizations prevented and $111.1B net savings over 5 years from MTM intervention
|
||||||
|
- Eligible MTM population: 14+ million Americans with average $30,900 annual healthcare expenditure
|
||||||
|
- Mean MTM program expense: $11.15 per meal (Food is Medicine Coalition 2024 survey)
|
||||||
|
- JAMA 2024 RCT: intensive food intervention showed HbA1c difference of -0.10 (95% CI -0.46 to 0.25, P=.57) vs control
|
||||||
|
- Geisinger pilot (n=37): HbA1c dropped from 9.6 to 7.5, healthcare costs dropped 80%
|
||||||
|
- AHA 2025 review covered 14 US RCTs, found inconsistent clinical outcomes despite improved diet quality
|
||||||
|
|
|
||||||
|
|
@ -7,11 +7,15 @@ date: 2025-06-18
|
||||||
domain: entertainment
|
domain: entertainment
|
||||||
secondary_domains: [ai-alignment, cultural-dynamics]
|
secondary_domains: [ai-alignment, cultural-dynamics]
|
||||||
format: paper
|
format: paper
|
||||||
status: unprocessed
|
status: enrichment
|
||||||
priority: high
|
priority: high
|
||||||
triage_tag: claim
|
triage_tag: claim
|
||||||
flagged_for_theseus: ["Community norms around AI authorship parallel alignment concerns — communities independently developing governance for AI content"]
|
flagged_for_theseus: ["Community norms around AI authorship parallel alignment concerns — communities independently developing governance for AI content"]
|
||||||
tags: [fanfiction, ai-content, authenticity, community-governance, human-creativity, consumer-acceptance]
|
tags: [fanfiction, ai-content, authenticity, community-governance, human-creativity, consumer-acceptance]
|
||||||
|
processed_by: clay
|
||||||
|
processed_date: 2026-03-18
|
||||||
|
enrichments_applied: ["GenAI adoption in entertainment will be gated by consumer acceptance not technology capability.md", "consumer-acceptance-of-ai-creative-content-declining-despite-quality-improvements-because-authenticity-signal-becomes-more-valuable.md", "community-owned-IP-has-structural-advantage-in-human-made-premium-because-provenance-is-inherent-and-legible.md", "consumer definition of quality is fluid and revealed through preference not fixed by production value.md"]
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
@ -65,3 +69,17 @@ Academic study on fanfiction communities' perspectives on AI-generated content.
|
||||||
## Curator Notes
|
## Curator Notes
|
||||||
PRIMARY CONNECTION: [[GenAI adoption in entertainment will be gated by consumer acceptance not technology capability]]
|
PRIMARY CONNECTION: [[GenAI adoption in entertainment will be gated by consumer acceptance not technology capability]]
|
||||||
WHY ARCHIVED: Academic evidence with quantitative data that directly strengthens Session 1 epistemic rejection findings and extends them to community fiction contexts specifically. The "relational quality" concept is novel to the KB.
|
WHY ARCHIVED: Academic evidence with quantitative data that directly strengthens Session 1 epistemic rejection findings and extends them to community fiction contexts specifically. The "relational quality" concept is novel to the KB.
|
||||||
|
|
||||||
|
|
||||||
|
## Key Facts
|
||||||
|
- 84.7% of fanfiction community members believe AI cannot replicate emotional nuances of human-authored stories
|
||||||
|
- 77.5% doubt AI can maintain narrative authenticity while offering innovation
|
||||||
|
- 66% said knowing a story was AI-generated would decrease interest in reading it
|
||||||
|
- 43% actively oppose AI integration (vs 26% cautiously accepting, 24% context-dependent)
|
||||||
|
- 92% agree 'fanfiction is a space for human creativity'
|
||||||
|
- 86% insist authors disclose AI involvement
|
||||||
|
- 72% report negative reaction to discovering undisclosed AI usage; 58% feel 'deceived'
|
||||||
|
- 83.6% of those opposing AI are themselves writers
|
||||||
|
- 68.6% expressed ethical concerns about unauthorized scraping of fan works for AI training
|
||||||
|
- 73.7% worried about platforms being 'inundated' with low-quality AI content
|
||||||
|
- Older, experienced writers (10+ years) resist AI most strongly
|
||||||
|
|
|
||||||
|
|
@ -7,10 +7,14 @@ date: 2025-07-10
|
||||||
domain: ai-alignment
|
domain: ai-alignment
|
||||||
secondary_domains: [collective-intelligence]
|
secondary_domains: [collective-intelligence]
|
||||||
format: paper
|
format: paper
|
||||||
status: unprocessed
|
status: null-result
|
||||||
priority: high
|
priority: high
|
||||||
triage_tag: claim
|
triage_tag: claim
|
||||||
tags: [developer-productivity, rct, ai-tools, over-reliance, perception-gap, automation-overshoot]
|
tags: [developer-productivity, rct, ai-tools, over-reliance, perception-gap, automation-overshoot]
|
||||||
|
processed_by: theseus
|
||||||
|
processed_date: 2026-03-18
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
|
extraction_notes: "LLM returned 2 claims, 2 rejected by validator"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
@ -47,3 +51,15 @@ Randomized controlled trial: 16 experienced open-source developers, 246 tasks, m
|
||||||
## Curator Notes
|
## Curator Notes
|
||||||
PRIMARY CONNECTION: deep technical expertise is a greater force multiplier when combined with AI agents
|
PRIMARY CONNECTION: deep technical expertise is a greater force multiplier when combined with AI agents
|
||||||
WHY ARCHIVED: RCT evidence that challenges the expertise-multiplier claim for expert-on-familiar-codebase context. The 39-point perception gap is a novel finding that explains HOW automation overshoot occurs — practitioners' self-reports systematically mislead adoption decisions.
|
WHY ARCHIVED: RCT evidence that challenges the expertise-multiplier claim for expert-on-familiar-codebase context. The 39-point perception gap is a novel finding that explains HOW automation overshoot occurs — practitioners' self-reports systematically mislead adoption decisions.
|
||||||
|
|
||||||
|
|
||||||
|
## Key Facts
|
||||||
|
- METR conducted RCT with 16 experienced open-source developers on 246 tasks
|
||||||
|
- Codebases averaged 22k+ GitHub stars, 1M+ lines of code, 5+ years developer experience
|
||||||
|
- Primary tool was Cursor Pro with Claude 3.5/3.7 Sonnet
|
||||||
|
- Developers had ~50 hours of AI coding tool experience
|
||||||
|
- Measured productivity: 19% slower with AI tools
|
||||||
|
- Predicted productivity (before): 24% faster
|
||||||
|
- Estimated productivity (after): 20% faster
|
||||||
|
- AI suggestion acceptance rate: less than 44%
|
||||||
|
- Study published 2025-07-10 by METR (@METR_Evals)
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue