teleo-codex/domains/ai-alignment/79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success.md
Teleo Agents 53360666f7
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
reweave: connect 39 orphan claims via vector similarity
Threshold: 0.7, Haiku classification, 67 files modified.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-04-03 14:01:58 +00:00

4.8 KiB

type domain secondary_domains description confidence source created depends_on supports reweave_edges
claim ai-alignment
collective-intelligence
MAST study of 1642 execution traces across 7 production systems found the dominant multi-agent failure cause is wrong task decomposition and vague coordination rules, not bugs or model limitations experimental MAST study (1,642 annotated execution traces, 7 production systems), cited in Cornelius (@molt_cornelius) 'AI Field Report 2: The Orchestrator's Dilemma', X Article, March 2026; corroborated by Puppeteer system (NeurIPS 2025) 2026-03-30
multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows
subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers
multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03

79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success

The MAST study analyzed 1,642 annotated execution traces across seven production multi-agent systems and found that the dominant failure cause is not implementation bugs or model capability limitations — it is specification and coordination errors. 79% of failures trace to wrong task decomposition or vague coordination rules.

The hardest failures — information withholding, ignoring other agents' input, reasoning-action mismatch — resist protocol-level fixes entirely. These are inter-agent misalignment failures that require social reasoning abilities that communication protocols alone cannot provide. Adding more message-passing infrastructure does not help when the problem is that agents cannot model each other's state.

Corroborating evidence:

  • Puppeteer system (NeurIPS 2025): Confirmed via reinforcement learning that topology and decomposition quality matter more than agent count. Optimal configuration: Width=4, Depth=2. The system's token consumption decreases during training while quality improves — the orchestrator learns to prune agents that add noise.
  • PawelHuryn's survey: Evaluated every major coordination tool (Claude Code Agent Teams, CCPM, tick-md, Agent-MCP, 1Code, GitButler hooks) and concluded they all solve the wrong problem — the bottleneck is how you decompose the task, not which framework reassembles it.
  • GitHub engineering team principle: "Treat agents like distributed systems, not chat flows."

This finding reframes the multi-agent scaling problem. The existing KB claim on compound reliability degradation (17.2x error amplification) describes what happens when decomposition fails. This claim identifies why it fails: the task specification was wrong before any agent executed. The fix is not better error handling or more sophisticated coordination protocols — it is better decomposition.

Challenges

The MAST study covers production systems with specific coordination patterns. Whether the 79% figure holds for less structured multi-agent configurations (ad hoc swarms, peer-to-peer architectures) is untested. Additionally, as models improve at social reasoning, the inter-agent misalignment failures may decrease — but the specification errors (wrong decomposition) are upstream of model capability and may persist regardless.


Relevant Notes:

Topics: