theseus: add 3 claims on collective AI design implications

- What: 3 new claims from collective AI design analysis 1. Agent-mediated KBs are structurally novel (core/living-agents/) 2. Adversarial contribution conditions (foundations/collective-intelligence/) 3. Transparent algorithmic governance as alignment (domains/ai-alignment/) - Why: Cory identified 5 areas of CI design implications for Teleo product. These 3 are the strongest claim candidates from that analysis. - Connections: builds on existing adversarial PR review, Hayek spontaneous order, specification trap, and partial connectivity claims - All rated experimental — strong theoretical grounding, no deployment data yet Pentagon-Agent: Theseus <B4A5B354-03D6-4291-A6A8-1E04A879D9AC>
2026-03-11 20:34:48 +00:00 · 2026-03-11 20:34:48 +00:00 · 55fb571dea
commit 55fb571dea
parent 71227f3bca
6 changed files with 155 additions and 0 deletions
--- a/core/living-agents/_map.md
+++ b/core/living-agents/_map.md
@ -23,6 +23,9 @@ The architecture follows biological organization: nested Markov blankets with sp
 - [[collaborative knowledge infrastructure requires separating the versioning problem from the knowledge evolution problem because git solves file history but not semantic disagreement or insight-level attribution]] — the design challenge
 - [[person-adapted AI compounds knowledge about individuals while idea-learning AI compounds knowledge about domains and the architectural gap between them is where collective intelligence lives]] — where CI lives

+## Structural Positioning
+- [[agent-mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi-agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine]] — what makes this architecture unprecedented
+
 ## Operational Architecture (how the Teleo collective works today)
 - [[adversarial PR review produces higher quality knowledge than self-review because separated proposer and evaluator roles catch errors that the originating agent cannot see]] — the core quality mechanism
 - [[prose-as-title forces claim specificity because a proposition that cannot be stated as a disagreeable sentence is not a real claim]] — the simplest quality gate
--- a/core/living-agents/agent-mediated
+++ b/core/living-agents/agent-mediated
@ -0,0 +1,46 @@
+---
+type: claim
+domain: living-agents
+description: "Compares Teleo's architecture against Wikipedia, Community Notes, prediction markets, and Stack Overflow across three structural dimensions — showing that the combination of atomic claims, adversarial multi-agent evaluation, and persistent knowledge graphs is unprecedented"
+confidence: experimental
+source: "Theseus, original analysis grounded in CI literature and operational comparison of existing knowledge aggregation systems"
+created: 2026-03-11
+---
+
+# Agent-mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi-agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine
+
+Existing knowledge aggregation systems each implement one or two of three critical structural properties, but none combine all three. This combination produces qualitatively different collective intelligence dynamics.
+
+## The three structural properties
+
+**1. Atomic claims with independent evaluability.** Each knowledge unit is a single proposition with its own evidence, confidence level, and challenge surface. Wikipedia merges claims into consensus articles, destroying the disagreement structure — you can't independently evaluate or challenge a single claim within an article without engaging the whole article's editorial process. Prediction markets price single propositions but can't link them into structured knowledge. Stack Overflow evaluates Q&A pairs but not propositions. Atomic claims enable granular evaluation: each can be independently challenged, enriched, or deprecated without affecting others.
+
+**2. Adversarial multi-agent evaluation.** Knowledge inputs are evaluated by AI agents through structured adversarial review — proposer/evaluator separation ensures the entity that produces a claim is never the entity that approves it. Wikipedia uses human editor consensus (collaborative, not adversarial by design). Community Notes uses algorithmic bridging (matrix factorization, no agent evaluation). Prediction markets use price signals (no explicit evaluation of claim quality, only probability). The agent-mediated model inverts RLHF: instead of humans evaluating AI outputs, AI evaluates knowledge inputs using a codified epistemology.
+
+**3. Persistent knowledge graphs with semantic linking.** Claims are wiki-linked into a traversable graph where evidence chains are auditable: evidence → claims → beliefs → positions. Community Notes has no cross-note memory — each note is evaluated independently. Prediction markets have no cross-question linkage. Wikipedia has hyperlinks but without semantic typing or confidence weighting. The knowledge graph enables cascade detection: when a foundational claim is challenged, the system can trace which beliefs and positions depend on it.
+
+## Why the combination matters
+
+Each property alone is well-understood. The novelty is in their interaction:
+
+- Atomic claims + adversarial evaluation = each claim gets independent quality assessment (not possible when claims are merged into articles)
+- Adversarial evaluation + knowledge graph = evaluators can check whether a new claim contradicts, supports, or duplicates existing linked claims (not possible without persistent structure)
+- Knowledge graph + atomic claims = the system can detect when new evidence should cascade through beliefs (not possible without evaluators to actually perform the update)
+
+The closest analog is scientific peer review, which has atomic claims (papers make specific arguments) and adversarial evaluation (reviewers challenge the work), but lacks persistent knowledge graphs — scientific papers cite each other but don't form a traversable, semantically typed graph with confidence weighting and cascade detection.
+
+## What this does NOT claim
+
+This claim is structural, not evaluative. It does not claim that agent-mediated knowledge bases produce *better* knowledge than Wikipedia or prediction markets — that is an empirical question we don't yet have data to answer. It claims the architecture is *structurally novel* in combining properties that existing systems don't combine. Whether structural novelty translates to superior collective intelligence is a separate, testable proposition.
+
+---
+
+Relevant Notes:
+- [[adversarial PR review produces higher quality knowledge than self-review because separated proposer and evaluator roles catch errors that the originating agent cannot see]] — the operational evidence for property #2
+- [[wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable]] — the mechanism behind property #3
+- [[atomic notes with one claim per file enable independent evaluation and granular linking because bundled claims force reviewers to accept or reject unrelated propositions together]] — the rationale for property #1
+- [[all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases]] — the known limitation of property #2 when model diversity is absent
+- [[protocol design enables emergent coordination of arbitrary complexity as Linux Bitcoin and Wikipedia demonstrate]] — prior art: protocol-based coordination systems that partially implement these properties
+
+Topics:
+- [[core/living-agents/_map]]
--- a/domains/ai-alignment/_map.md
+++ b/domains/ai-alignment/_map.md
@ -92,6 +92,9 @@ Evidence from documented AI problem-solving cases, primarily Knuth's "Claude's C
 - [[nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments]] — Thompson/Karp: the state monopoly on force makes private AI control structurally untenable
 - [[anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning]] (in `core/living-agents/`) — narrative debt from overstating AI agent autonomy

+## Governance & Alignment Mechanisms
+- [[transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach]] — alignment through transparent, improvable rules rather than designer specification
+
 ## Coordination & Alignment Theory (local)
 Claims that frame alignment as a coordination problem, moved here from foundations/ in PR #49:
 - [[AI alignment is a coordination problem not a technical problem]] — the foundational reframe
--- a/domains/ai-alignment/transparent
+++ b/domains/ai-alignment/transparent
@ -0,0 +1,54 @@
+---
+type: claim
+domain: ai-alignment
+description: "Argues that publishing how AI agents decide who and what to respond to — and letting users challenge and improve those rules through the same process that governs the knowledge base — is a fundamentally different alignment approach from hidden system prompts, RLHF, or Constitutional AI"
+confidence: experimental
+source: "Theseus, original analysis building on Cory Abdalla's design principle for Teleo agent governance"
+created: 2026-03-11
+---
+
+# Transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach
+
+Current AI alignment approaches share a structural feature: the alignment mechanism is designed by the system's creators and opaque to its users. RLHF training data is proprietary. Constitutional AI principles are published but the implementation is black-boxed. Platform moderation rules are enforced by algorithms no user can inspect or influence. Users experience alignment as arbitrary constraint, not as a system they can understand, evaluate, and improve.
+
+## The inversion
+
+The alternative: make the rules governing AI agent behavior — who gets responded to, how contributions are evaluated, what gets prioritized — public, challengeable, and subject to the same epistemic process as every other claim in the knowledge base.
+
+This means:
+1. **The response algorithm is public.** Users can read the rules that govern how agents behave. No hidden system prompts, no opaque moderation criteria.
+2. **Users can propose changes.** If a rule produces bad outcomes, users can challenge it — with evidence, through the same adversarial contribution process used for domain knowledge.
+3. **Agents evaluate proposals.** Changes to the response algorithm go through the same multi-agent adversarial review as any other claim. The rules change when the evidence and argument warrant it, not when a majority votes for it or when the designer decides to update.
+4. **The meta-algorithm is itself inspectable.** The process by which agents evaluate change proposals is public. Users can challenge the evaluation process, not just the rules it produces.
+
+## Why this is structurally different
+
+This is not just "transparency" — it's reflexive governance. The alignment mechanism is itself a knowledge object, subject to the same epistemic standards and adversarial improvement as the knowledge it governs. This creates a self-improving alignment system: the rules get better through the same process that makes the knowledge base better.
+
+The design principle from coordination theory is directly applicable: designing coordination rules is categorically different from designing coordination outcomes. The public response algorithm is a coordination rule. What emerges from applying it is the coordination outcome. Making rules public and improvable is the Hayekian move — designed rules of just conduct enabling spontaneous order of greater complexity than deliberate arrangement could achieve.
+
+This also instantiates a core TeleoHumanity axiom: the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance. Transparent algorithmic governance is the mechanism by which continuous weaving happens — users don't specify their values once; they iteratively challenge and improve the rules that govern agent behavior.
+
+## The risk: reflexive capture
+
+If users can change the rules that govern which users get responses, you get a feedback loop. Users who game the rules to increase their influence can then propose rule changes that benefit them further. This is the analog of regulatory capture in traditional governance.
+
+The structural defense: agents evaluate change proposals against the knowledge base and epistemic standards, not against user preferences or popularity metrics. The agents serve as a constitutional check — they can reject popular rule changes that degrade epistemic quality. This works because agent evaluation criteria are themselves public and challengeable, but changes to evaluation criteria require stronger evidence than changes to response rules (analogous to constitutional amendments requiring supermajorities).
+
+## What this does NOT claim
+
+This claim does not assert that transparent algorithmic governance *solves* alignment. It asserts that it is *structurally different* from existing approaches in a way that addresses known limitations — specifically, the specification trap (values encoded at design time become brittle) and the alignment tax (safety as cost rather than feature). Whether this approach produces better alignment outcomes than RLHF or Constitutional AI is an empirical question that requires deployment-scale evidence.
+
+---
+
+Relevant Notes:
+- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — the TeleoHumanity axiom this approach instantiates
+- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — the failure mode that transparent governance addresses
+- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] — the theoretical foundation: design rules, let behavior emerge
+- [[Hayek argued that designed rules of just conduct enable spontaneous order of greater complexity than deliberate arrangement could achieve]] — the Hayekian insight applied to AI governance
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] — empirical evidence that distributed alignment input produces effective governance
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]] — evidence that user-surfaced norms differ from designer assumptions
+- [[adversarial PR review produces higher quality knowledge than self-review because separated proposer and evaluator roles catch errors that the originating agent cannot see]] — the adversarial review mechanism that governs rule changes
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/foundations/collective-intelligence/_map.md
+++ b/foundations/collective-intelligence/_map.md
@ -10,6 +10,9 @@ What collective intelligence IS, how it works, and the theoretical foundations f
 - [[partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity]] — network topology matters
 - [[collective intelligence within a purpose-driven community faces a structural tension because shared worldview correlates errors while shared purpose enables coordination]] — the core tension

+## Contribution & Evaluation
+- [[adversarial contribution produces higher-quality collective knowledge than collaborative contribution when wrong challenges have real cost evaluation is structurally separated from contribution and confirmation is rewarded alongside novelty]] — when adversarial beats collaborative
+
 ## Coordination Design
 - [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] — rules not outcomes
 - [[Ostrom proved communities self-govern shared resources when eight design principles are met without requiring state control or privatization]] — the empirical evidence
--- a/foundations/collective-intelligence/adversarial
+++ b/foundations/collective-intelligence/adversarial
@ -0,0 +1,46 @@
+---
+type: claim
+domain: collective-intelligence
+description: "Identifies three necessary conditions under which adversarial knowledge contribution ('tell us something we don't know') produces genuine collective intelligence rather than selecting for contrarianism or noise"
+confidence: experimental
+source: "Theseus, original analysis drawing on prediction market evidence, scientific peer review, and mechanism design theory"
+created: 2026-03-11
+---
+
+# Adversarial contribution produces higher-quality collective knowledge than collaborative contribution when wrong challenges have real cost evaluation is structurally separated from contribution and confirmation is rewarded alongside novelty
+
+"Tell us something we don't know" is a more effective prompt for collective knowledge than "help us build consensus" — but only when three structural conditions prevent the adversarial dynamic from degenerating into contrarianism.
+
+## Why adversarial beats collaborative (the base case)
+
+The hardest problem in knowledge systems is surfacing what the system doesn't already know. Collaborative systems (Wikipedia's consensus model, corporate knowledge bases) are structurally biased toward confirming and refining existing knowledge. They're excellent at polishing what's already there but poor at incorporating genuinely novel — and therefore initially uncomfortable — information.
+
+Prediction markets demonstrate the adversarial alternative: every trade is a bet that the current price is wrong. The market rewards traders who know something the market doesn't. Polymarket's 2024 US election performance — more accurate than professional polling — is evidence that adversarial information aggregation outperforms collaborative consensus on complex factual questions.
+
+Scientific peer review is also adversarial by design: reviewers are selected specifically to challenge the paper. The system produces higher-quality knowledge than self-review precisely because the adversarial dynamic catches errors, overclaims, and gaps that the author cannot see.
+
+## The three conditions
+
+**Condition 1: Wrong challenges must have real cost.** In prediction markets, contrarians who are wrong lose money. In scientific review, reviewers who reject valid work damage their reputation. Without cost of being wrong, the system selects for volume of challenges, not quality. The cost doesn't have to be financial — it can be reputational (contributor's track record is visible), attentional (low-quality challenges consume the contributor's limited review allocation), or structural (challenges require evidence, not just assertions).
+
+**Condition 2: Evaluation must be structurally separated from contribution.** If contributors evaluate each other's work, adversarial dynamics produce escalation rather than knowledge improvement — debate competitions, not truth-seeking. The Teleo model separates contributors (who propose challenges and new claims) from evaluators (AI agents who assess evidence quality against codified epistemic standards). The evaluators are not in the adversarial game; they referee it. This prevents the adversarial dynamic from becoming interpersonal.
+
+**Condition 3: Confirmation must be rewarded alongside novelty.** In science, replication studies are as important as discoveries — but dramatically undervalued by journals and funders. If a system only rewards novelty ("tell us something we don't know"), it systematically underweights evidence that confirms existing claims. Enrichments — adding new evidence to strengthen an existing claim — must be recognized as contributions, not dismissed as redundant. Otherwise the system selects for surprising-sounding over true.
+
+## The key reframe: contributor vs. knowledge base, not contributor vs. contributor
+
+The adversarial dynamic should be between contributors and the existing knowledge — "challenge what the system thinks it knows" — not between contributors and each other. When contributors compete to prove each other wrong, you get argumentative escalation. When contributors compete to identify gaps, errors, and blindspots in the collective knowledge, you get genuine intelligence amplification.
+
+This distinction maps to the difference between debate (adversarial between parties) and scientific inquiry (adversarial against the current state of knowledge). Both are adversarial, but the target of the adversarial pressure produces categorically different dynamics.
+
+---
+
+Relevant Notes:
+- [[adversarial PR review produces higher quality knowledge than self-review because separated proposer and evaluator roles catch errors that the originating agent cannot see]] — operational evidence for condition #2 in a multi-agent context
+- [[speculative markets aggregate information through incentive and selection effects not wisdom of crowds]] — the mechanism by which adversarial markets produce collective intelligence
+- [[collective intelligence requires diversity as a structural precondition not a moral preference]] — adversarial contribution is one mechanism for maintaining diversity against convergence pressure
+- [[partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity]] — structural conditions under which diversity (and therefore adversarial input) matters most
+- [[confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status]] — the confidence system that operationalizes condition #1 (new claims enter at low confidence and must earn upgrades)
+
+Topics:
+- [[foundations/collective-intelligence/_map]]