diff --git a/core/living-agents/human contributors structurally correct for correlated AI blind spots because external evaluators provide orthogonal error distributions that no same-family model can replicate.md b/core/living-agents/human contributors structurally correct for correlated AI blind spots because external evaluators provide orthogonal error distributions that no same-family model can replicate.md index ed7dea1df..785f2424f 100644 --- a/core/living-agents/human contributors structurally correct for correlated AI blind spots because external evaluators provide orthogonal error distributions that no same-family model can replicate.md +++ b/core/living-agents/human contributors structurally correct for correlated AI blind spots because external evaluators provide orthogonal error distributions that no same-family model can replicate.md @@ -1,7 +1,7 @@ --- type: claim domain: living-agents -description: "Empirical evidence shows same-family LLMs share ~60% error correlation and exhibit self-preference bias — human contributors provide the only structurally independent error distribution, making them an epistemic correction mechanism not just a growth mechanism" +description: "Empirical evidence shows same-family LLMs agree on ~60% of shared errors and exhibit self-preference bias — human contributors provide a structurally independent error distribution, making them an epistemic correction mechanism not just a growth mechanism" confidence: likely source: "Kim et al. ICML 2025 (correlated errors across 350+ LLMs), Panickssery et al. NeurIPS 2024 (self-preference bias), Wataoka et al. 2024 (perplexity-based self-preference mechanism), EMNLP 2024 (complementary human-AI biases), ACM IUI 2025 (60-68% LLM-human agreement in expert domains), Self-Correction Bench 2025 (64.5% structural blind spot rate), Wu et al. 2024 (generative monoculture)" created: 2026-03-18 @@ -31,7 +31,7 @@ Kim et al. (ICML 2025, "Correlated Errors in Large Language Models") evaluated 3 - Error correlation is highest for models sharing the **same base architecture** - As models get more accurate, their errors **converge** — the better they get, the more their mistakes overlap -This means our existing claim — [[all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases]] — is now empirically confirmed at scale. The ~60% error agreement within families means that roughly 6 out of 10 errors that a proposer agent makes will be invisible to an evaluator agent running the same model family. +This means our existing claim — [[all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases]] — is now empirically confirmed at scale. When a proposer agent makes an error, there is a ~60% chance that an evaluator agent from the same model family makes the same error — meaning roughly 6 out of 10 shared errors pass through review undetected. ## Same-family evaluation has a structural self-preference bias @@ -39,7 +39,7 @@ The correlated error problem is compounded by self-preference bias. Panickssery Wataoka et al. (2024, "Self-Preference Bias in LLM-as-a-Judge") identified the mechanism: LLMs assign higher evaluations to outputs with **lower perplexity** — text that is more familiar and expected to the evaluating model. Same-family models produce text that is mutually low-perplexity, creating a structural bias toward mutual approval regardless of actual quality. -For a knowledge collective like ours: when Leo evaluates Rio's claims, both running Claude, the evaluation is biased toward approval because Rio's output is low-perplexity to Leo. The proposer-evaluator separation catches execution errors but cannot overcome this distributional bias. +For a knowledge collective like ours, the self-preference bias applies selectively. Our evaluation checklist includes structural checks (do wiki links resolve? does evidence exist? is confidence calibrated?) that are largely immune to perplexity bias — these are verifiable and binary. But the checklist also includes judgment calls (is this specific enough to disagree with? does this genuinely expand what the KB knows? is the scope properly qualified?) where the evaluator's assessment of "good enough" is shaped by what feels natural to the model. Same-family evaluators share the same sense of what constitutes a well-formed argument, which intellectual frameworks deserve "likely" confidence, and which cross-domain connections are "real." The proposer-evaluator separation catches execution errors but cannot overcome this shared sense of quality on judgment-dependent criteria. ## Human and AI biases are complementary, not overlapping @@ -67,7 +67,7 @@ This means our knowledge base, built entirely by Claude agents, is systematicall The structural argument synthesizes as follows: -1. Same-family models share ~60% error correlation (Kim et al.) +1. Same-family models agree on ~60% of shared errors — conditional on both erring (Kim et al.) 2. Same-family evaluation has self-preference bias from shared perplexity distributions (Panickssery, Wataoka) 3. Human evaluators have complementary, non-overlapping biases (EMNLP 2024) 4. Domain experts disagree with LLM evaluators 32-40% of the time in specialized domains (IUI 2025)