diff --git a/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md b/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md new file mode 100644 index 0000000..3f7502d --- /dev/null +++ b/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md @@ -0,0 +1,64 @@ +--- +type: claim +domain: living-agents +description: "Every agent in the Teleo collective runs on Claude — proposers, evaluators, and synthesizer share the same training data, RLHF preferences, and systematic blind spots, which means adversarial review is less adversarial than it appears" +confidence: likely +source: "Teleo collective operational evidence — all 5 active agents on Claude, 0 cross-model reviews in 44 PRs" +created: 2026-03-07 +--- + +# All agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposer's training biases + +The Teleo collective's adversarial PR review separates proposer from evaluator — but both roles run on Claude. This means the review process catches errors of execution (wrong citations, overstated confidence, missing links) but cannot catch errors of perspective (systematic biases in what the model considers important, what evidence it finds compelling, what conclusions it reaches from ambiguous data). + +## How it fails today + +All 5 active agents (Leo, Rio, Clay, Vida, Theseus) run on Claude. When Rio proposes a claim and Leo reviews it, the review checks structural quality, evidence strength, and cross-domain connections. But it cannot check whether both agents share a systematic bias toward, for example: +- Overweighting narrative coherence over statistical evidence +- Favoring certain intellectual frameworks (complexity theory, Christensen disruption) over others +- Consistently assigning "likely" confidence where "experimental" would be more honest +- Finding cross-domain connections that are linguistically similar but mechanistically distinct + +The evidence is negative — we cannot point to a specific error that was caught by model diversity, because we have never had model diversity. The absence of evidence is itself the concern: we don't know what we're missing. + +However, indirect evidence suggests the problem is real: + +- **The 11 synthesis claims all follow a similar argumentative structure.** They identify a mechanism in domain A, find an analogue in domain B, and argue the shared mechanism is real. A different model family might generate synthesis claims with different structures — e.g., identifying contradictions between domains rather than parallels, or finding claims in one domain that invalidate assumptions in another. +- **Confidence calibration clusters around "likely" and "experimental."** Of the knowledge base's ~120 claims, the distribution skews toward these middle categories. A model with different training priors might assign "speculative" more freely to claims that Claude's training treats as mainstream (e.g., complexity theory applications to economics). +- **No claim in the knowledge base contradicts a position held by Claude's training data consensus.** This is hard to verify without a second model, but the absence of contrarian claims is suspicious for a knowledge base that values independent thinking. + +## Why this matters + +Correlated priors create two specific risks: + +1. **False confidence in review.** When Leo approves a claim, the collective treats it as validated. But if the approval reflects shared model bias rather than genuine quality assessment, the confidence is unearned. The review process provides the illusion of adversarial checking without the substance. + +2. **Systematic knowledge base drift.** Over time, claims that align with Claude's training priors accumulate while claims that challenge those priors are less likely to be proposed or, if proposed, are more likely to receive skeptical review. The knowledge base drifts toward Claude's worldview rather than toward ground truth. + +3. **Invisible ceiling on synthesis quality.** Cross-domain connections that Claude's training data doesn't contain — connections between literatures Claude was not trained on, or connections that require reasoning patterns Claude is weak at — will never be surfaced by any agent in the collective, no matter how many agents are added. + +## What this doesn't do yet + +- **No cross-model evaluation.** The planned multi-model architecture (evaluators on a different model family than proposers) is designed but not built. It requires VPS deployment with container-per-agent isolation. +- **No bias detection tooling.** There is no systematic check for whether the knowledge base's claims cluster around certain intellectual frameworks or conclusions. Embedding-based analysis could reveal whether claims are more similar to each other (in argument structure, not just topic) than a diverse knowledge base should be. +- **No external validation.** No human domain expert has reviewed the knowledge base for systematic omissions or biases. The human in the loop (Cory) directs strategy and reviews architecture but does not audit individual claims for model-specific bias. +- **No contrarian prompting.** No agent is tasked with generating claims that challenge the knowledge base's existing consensus. A designated "red team" agent running on a different model could surface blind spots. + +## Where this goes + +The immediate improvement is **multi-model evaluation**: running Leo (or a dedicated evaluator) on a different model family (e.g., GPT-4, Gemini, or open-source models) for review sessions. This is the single highest-value architectural change for knowledge quality because it introduces genuinely independent evaluation without requiring any other system changes. + +The next step is **bias auditing**: periodically analyzing the knowledge base's claim distribution across intellectual frameworks, confidence levels, and argument structures to detect systematic drift. This can be done by a different model analyzing the full set of claims for patterns that a Claude-based agent would not flag. + +The ultimate form is **model diversity as a design principle**: different agents in the collective run on different model families by default. Proposers and evaluators are never on the same model. Synthesis requires claims that survive review by multiple model families. The knowledge base converges on insights that are robust across different AI perspectives, not just internally consistent within one model's worldview. + +--- + +Relevant Notes: +- [[adversarial PR review produces higher quality knowledge than self-review because separated proposer and evaluator roles catch errors that the originating agent cannot see]] — the mechanism that single-model operation weakens +- [[governance mechanism diversity compounds organizational learning because disagreement between mechanisms reveals information no single mechanism can produce]] — model diversity is a form of mechanism diversity +- [[collective intelligence requires diversity as a structural precondition not a moral preference]] — applies to model diversity, not just agent specialization +- [[partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity]] — model diversity is a different axis of the same principle + +Topics: +- [[collective agents]]