teleo-codex/domains/ai-alignment/multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks.md
Alex dba00a7960 Recover alexastrum contributions from GitHub PR #68 (lost during mirror sync)
6 claims + 1 source originally merged Mar 9 via GitHub squash merge.
Forgejo→GitHub mirror overwrote GitHub main, erasing these files.
Recovered from unreachable commit 9bd6c77c before GitHub GC.
Added sourcer: alexastrum attribution to claim frontmatter.
2026-04-16 16:46:26 +00:00

4.3 KiB

type domain secondary_domains description confidence source sourcer created
claim ai-alignment
collective-intelligence
SWE-AF deploys 400-500+ agents across planning, coding, reviewing, QA, and verification roles scoring 95/100 versus 73 for single-agent Claude Code, demonstrating that multi-agent coordination with continual learning has moved from research to production. experimental Alex — based on Compass research artifact analyzing SWE-AF, Cisco multi-agent PR reviewer, and BugBot (2026-03-08) alexastrum 2026-03-08

Multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks

The pattern of Agent A proposing via PR and Agent B reviewing has moved from research concept to production system. Three implementations demonstrate different aspects of maturity.

SWE-AF (Agent Field) deploys 400-500+ agent instances across planning, coding, reviewing, QA, and verification roles, scoring 95/100 on benchmarks versus 73 for single-agent Claude Code. Each agent operates in an isolated git worktree, with a merger agent integrating branches and a verifier agent checking acceptance criteria against the PRD. Critically, SWE-AF implements continual learning: conventions and failure patterns discovered early are injected into downstream agent instances. This is not just parallelization — the system gets smarter as it works.

Cisco's multi-agent PR reviewer demonstrates the specific reviewer architecture: static analysis and code review agents run in parallel, a cross-referencing pipeline (initializer → generator → reflector) iterates on findings, and a comment filterer consolidates before posting. Built on LangGraph, it includes evaluation tooling that replays PR history with "LLM-as-a-judge" scoring.

BugBot implements the most rigorous adversarial review pattern: a self-referential execution loop where each iteration gets fresh context, picks new attack angles, and requires file:line evidence for every finding. Seven ODC trigger categories must each be tested, and consensus voting between independent agents auto-upgrades confidence when two agents flag the same issue.

The 95 vs 73 performance gap is significant because it demonstrates that coordination overhead is more than compensated by specialization benefits. This is consistent with the general finding that coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem — the gains come from structuring how agents interact, not from making individual agents more capable.

The continual learning component is particularly important for knowledge base applications. In a knowledge validation pipeline, conventions and failure patterns discovered during early reviews (e.g., "claims about mechanism design require quantitative evidence") can be injected into downstream reviewer instances, creating an improving review process without human intervention.


Relevant Notes:

Topics: