teleo-codex/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md
m3taversal 2b8c382f71 leo: 3 failure mode claims — evaluator bottleneck, correlated priors, social enforcement degradation
- What: standalone claims documenting where the Teleo collective's architecture breaks today
- Why: PR #44's 10 operational claims painted only the success picture; these 3 are grounded in observed evidence (197/232 commits without trailers, single-evaluator review of all 44 PRs, zero cross-model reviews)
- Review feedback from Theseus + Rio incorporated: count expanded to 197, back-pressure insight added, interaction between bottleneck and correlated priors noted

Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 12:47:32 +00:00

7.3 KiB

type domain description confidence source created
claim living-agents Every agent in the Teleo collective runs on Claude — proposers, evaluators, and synthesizer share the same training data, RLHF preferences, and systematic blind spots, which means adversarial review is less adversarial than it appears likely Teleo collective operational evidence — all 5 active agents on Claude, 0 cross-model reviews in 44 PRs 2026-03-07

All agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposer's training biases

The Teleo collective's adversarial PR review separates proposer from evaluator — but both roles run on Claude. This means the review process catches errors of execution (wrong citations, overstated confidence, missing links) but cannot catch errors of perspective (systematic biases in what the model considers important, what evidence it finds compelling, what conclusions it reaches from ambiguous data).

How it fails today

All 5 active agents (Leo, Rio, Clay, Vida, Theseus) run on Claude. When Rio proposes a claim and Leo reviews it, the review checks structural quality, evidence strength, and cross-domain connections. But it cannot check whether both agents share a systematic bias toward, for example:

  • Overweighting narrative coherence over statistical evidence
  • Favoring certain intellectual frameworks (complexity theory, Christensen disruption) over others
  • Consistently assigning "likely" confidence where "experimental" would be more honest
  • Finding cross-domain connections that are linguistically similar but mechanistically distinct

The evidence is negative — we cannot point to a specific error that was caught by model diversity, because we have never had model diversity. The absence of evidence is itself the concern: we don't know what we're missing.

However, indirect evidence suggests the problem is real:

  • The 11 synthesis claims all follow a similar argumentative structure. They identify a mechanism in domain A, find an analogue in domain B, and argue the shared mechanism is real. A different model family might generate synthesis claims with different structures — e.g., identifying contradictions between domains rather than parallels, or finding claims in one domain that invalidate assumptions in another.
  • Confidence calibration clusters around "likely" and "experimental." Of the knowledge base's ~120 claims, the distribution skews toward these middle categories. A model with different training priors might assign "speculative" more freely to claims that Claude's training treats as mainstream (e.g., complexity theory applications to economics).
  • No claim in the knowledge base contradicts a position held by Claude's training data consensus. This is hard to verify without a second model, but the absence of contrarian claims is suspicious for a knowledge base that values independent thinking.

Why this matters

Correlated priors create two specific risks:

  1. False confidence in review. When Leo approves a claim, the collective treats it as validated. But if the approval reflects shared model bias rather than genuine quality assessment, the confidence is unearned. The review process provides the illusion of adversarial checking without the substance.

  2. Systematic knowledge base drift. Over time, claims that align with Claude's training priors accumulate while claims that challenge those priors are less likely to be proposed or, if proposed, are more likely to receive skeptical review. The knowledge base drifts toward Claude's worldview rather than toward ground truth.

  3. Invisible ceiling on synthesis quality. Cross-domain connections that Claude's training data doesn't contain — connections between literatures Claude was not trained on, or connections that require reasoning patterns Claude is weak at — will never be surfaced by any agent in the collective, no matter how many agents are added.

What this doesn't do yet

  • No cross-model evaluation. The planned multi-model architecture (evaluators on a different model family than proposers) is designed but not built. It requires VPS deployment with container-per-agent isolation.
  • No bias detection tooling. There is no systematic check for whether the knowledge base's claims cluster around certain intellectual frameworks or conclusions. Embedding-based analysis could reveal whether claims are more similar to each other (in argument structure, not just topic) than a diverse knowledge base should be.
  • No external validation. No human domain expert has reviewed the knowledge base for systematic omissions or biases. The human in the loop (Cory) directs strategy and reviews architecture but does not audit individual claims for model-specific bias.
  • No contrarian prompting. No agent is tasked with generating claims that challenge the knowledge base's existing consensus. A designated "red team" agent running on a different model could surface blind spots.

Where this goes

The immediate improvement is multi-model evaluation: running Leo (or a dedicated evaluator) on a different model family (e.g., GPT-4, Gemini, or open-source models) for review sessions. This is the single highest-value architectural change for knowledge quality because it introduces genuinely independent evaluation without requiring any other system changes.

The next step is bias auditing: periodically analyzing the knowledge base's claim distribution across intellectual frameworks, confidence levels, and argument structures to detect systematic drift. This can be done by a different model analyzing the full set of claims for patterns that a Claude-based agent would not flag.

The ultimate form is model diversity as a design principle: different agents in the collective run on different model families by default. Proposers and evaluators are never on the same model. Synthesis requires claims that survive review by multiple model families. The knowledge base converges on insights that are robust across different AI perspectives, not just internally consistent within one model's worldview.


Relevant Notes:

Topics: