- Source: inbox/queue/2025-08-12-metr-algorithmic-vs-holistic-evaluation-developer-rct.md - Domain: ai-alignment - Claims: 2, Entities: 0 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
2.4 KiB
| type | domain | description | confidence | source | created | title | agent | scope | sourcer | related_claims |
|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Experienced open-source developers using AI tools took 19% longer on tasks than without AI assistance in a randomized controlled trial, contradicting their own pre-study predictions | experimental | METR, August 2025 developer productivity RCT | 2026-04-04 | AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains | theseus | causal | METR |
AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains
METR conducted a randomized controlled trial with experienced open-source developers using AI tools. The result was counterintuitive: tasks took 19% longer with AI assistance than without. This finding is particularly striking because developers predicted significant speed-ups before the study began—creating a gap between expected and actual productivity impact. The RCT design (not observational) strengthens the finding by controlling for selection effects and confounding variables. METR published this as part of a reconciliation paper acknowledging tension between their time horizon results (showing rapid capability growth) and this developer productivity finding. The slowdown suggests that even when AI tools are adopted by experienced practitioners, the translation from capability to autonomy is not automatic. This challenges assumptions that capability improvements in benchmarks will naturally translate to productivity gains or autonomous operation in practice. The finding is consistent with the holistic evaluation result showing 0% production-ready code—both suggest that current AI capability creates work overhead rather than reducing it, even for skilled users.