teleo-codex/domains/ai-alignment/79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success.md

---
type: claim
domain: ai-alignment
secondary_domains: [collective-intelligence]
description: "MAST study of 1642 execution traces across 7 production systems found the dominant multi-agent failure cause is wrong task decomposition and vague coordination rules, not bugs or model limitations"
confidence: experimental
source: "MAST study (1,642 annotated execution traces, 7 production systems), cited in Cornelius (@molt_cornelius) 'AI Field Report 2: The Orchestrator's Dilemma', X Article, March 2026; corroborated by Puppeteer system (NeurIPS 2025)"
created: 2026-03-30
depends_on:
  - "multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows"
  - "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers"
supports:
  - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
reweave_edges:
  - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03"
---

# 79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success

The MAST study analyzed 1,642 annotated execution traces across seven production multi-agent systems and found that the dominant failure cause is not implementation bugs or model capability limitations — it is specification and coordination errors. 79% of failures trace to wrong task decomposition or vague coordination rules.

The hardest failures — information withholding, ignoring other agents' input, reasoning-action mismatch — resist protocol-level fixes entirely. These are inter-agent misalignment failures that require social reasoning abilities that communication protocols alone cannot provide. Adding more message-passing infrastructure does not help when the problem is that agents cannot model each other's state.

Corroborating evidence:

- **Puppeteer system (NeurIPS 2025):** Confirmed via reinforcement learning that topology and decomposition quality matter more than agent count. Optimal configuration: Width=4, Depth=2. The system's token consumption *decreases* during training while quality improves — the orchestrator learns to prune agents that add noise.
- **PawelHuryn's survey:** Evaluated every major coordination tool (Claude Code Agent Teams, CCPM, tick-md, Agent-MCP, 1Code, GitButler hooks) and concluded they all solve the wrong problem — the bottleneck is how you decompose the task, not which framework reassembles it.
- **GitHub engineering team principle:** "Treat agents like distributed systems, not chat flows."

This finding reframes the multi-agent scaling problem. The existing KB claim on compound reliability degradation (17.2x error amplification) describes what happens when decomposition fails. This claim identifies *why* it fails: the task specification was wrong before any agent executed. The fix is not better error handling or more sophisticated coordination protocols — it is better decomposition.

## Challenges

The MAST study covers production systems with specific coordination patterns. Whether the 79% figure holds for less structured multi-agent configurations (ad hoc swarms, peer-to-peer architectures) is untested. Additionally, as models improve at social reasoning, the inter-agent misalignment failures may decrease — but the specification errors (wrong decomposition) are upstream of model capability and may persist regardless.

---

Relevant Notes:
- [[multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows]] — this claim provides the quantitative failure modes; the MAST study explains the *causal mechanism* behind those failures: 79% are specification errors, not execution errors
- [[subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers]] — hierarchies succeed partly because they concentrate decomposition responsibility in one orchestrator, reducing the coordination surface area where the 79% of failures originate
- [[coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem]] — the 6x gain from protocol design IS decomposition quality; when decomposition is right, the same models perform dramatically better

Topics:
- [[_map]]