From af0d3001ffd8ce337ac8f2a558ddbcbfd5667590 Mon Sep 17 00:00:00 2001 From: m3taversal Date: Sat, 21 Mar 2026 14:28:52 +0000 Subject: [PATCH] =?UTF-8?q?leo:=20fix=20PR=20#1569=20review=20issues=20?= =?UTF-8?q?=E2=80=94=20soften=20challenge=20framing,=20fix=20source=20stat?= =?UTF-8?q?us?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - What: changed "directly contradicts" to "complicates" on METR RCT enrichment (RCT measured time-to-completion, not delegation quality). Fixed source status from non-standard "enrichment" to "processed". - Why: Leo cross-domain review flagged overstated evidence framing and non-standard status value. Pentagon-Agent: Leo --- ...lled practitioners delegate more effectively than novices.md | 2 +- inbox/queue/2026-03-21-metr-evaluation-landscape-2026.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/domains/ai-alignment/deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices.md b/domains/ai-alignment/deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices.md index 52ee97a6..5bbc7f7d 100644 --- a/domains/ai-alignment/deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices.md +++ b/domains/ai-alignment/deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices.md @@ -28,7 +28,7 @@ This claim describes a frontier-practitioner effect — top-tier experts getting ### Additional Evidence (challenge) *Source: [[2026-03-21-metr-evaluation-landscape-2026]] | Added: 2026-03-21* -METR's developer productivity RCT found that AI tools made experienced developers '19% longer' to complete tasks, showing negative productivity for experts. This directly contradicts the force multiplier hypothesis and suggests that current AI tools may actually impair expert performance, consistent with the prior METR developer RCT finding. +METR's developer productivity RCT found that AI tools made experienced developers '19% longer' to complete tasks, showing negative productivity for experts on time-to-completion metrics. This complicates the force multiplier hypothesis — the RCT measured task completion speed, not delegation quality or the scope of what experts can attempt. An expert who takes longer but produces better-scoped, more ambitious outputs is compatible with both this finding and the original claim. However, if the productivity drag persists across task types, it provides counter-evidence to at least one dimension of the expertise advantage. Relevant Notes: diff --git a/inbox/queue/2026-03-21-metr-evaluation-landscape-2026.md b/inbox/queue/2026-03-21-metr-evaluation-landscape-2026.md index 81824c28..ea74a081 100644 --- a/inbox/queue/2026-03-21-metr-evaluation-landscape-2026.md +++ b/inbox/queue/2026-03-21-metr-evaluation-landscape-2026.md @@ -7,7 +7,7 @@ date: 2026-03-01 domain: ai-alignment secondary_domains: [] format: thread -status: enrichment +status: processed priority: high tags: [METR, monitorability, MALT, sabotage-review, time-horizon, evaluation-infrastructure, oversight-evasion, Claude] processed_by: theseus