teleo-codex/domains/ai-alignment/multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks.md
m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected
Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:55:18 +01:00

4.4 KiB

type domain secondary_domains description confidence source sourcer created sourced_from
claim ai-alignment
collective-intelligence
SWE-AF deploys 400-500+ agents across planning, coding, reviewing, QA, and verification roles scoring 95/100 versus 73 for single-agent Claude Code, demonstrating that multi-agent coordination with continual learning has moved from research to production. experimental Alex — based on Compass research artifact analyzing SWE-AF, Cisco multi-agent PR reviewer, and BugBot (2026-03-08) alexastrum 2026-03-08
inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md

Multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks

The pattern of Agent A proposing via PR and Agent B reviewing has moved from research concept to production system. Three implementations demonstrate different aspects of maturity.

SWE-AF (Agent Field) deploys 400-500+ agent instances across planning, coding, reviewing, QA, and verification roles, scoring 95/100 on benchmarks versus 73 for single-agent Claude Code. Each agent operates in an isolated git worktree, with a merger agent integrating branches and a verifier agent checking acceptance criteria against the PRD. Critically, SWE-AF implements continual learning: conventions and failure patterns discovered early are injected into downstream agent instances. This is not just parallelization — the system gets smarter as it works.

Cisco's multi-agent PR reviewer demonstrates the specific reviewer architecture: static analysis and code review agents run in parallel, a cross-referencing pipeline (initializer → generator → reflector) iterates on findings, and a comment filterer consolidates before posting. Built on LangGraph, it includes evaluation tooling that replays PR history with "LLM-as-a-judge" scoring.

BugBot implements the most rigorous adversarial review pattern: a self-referential execution loop where each iteration gets fresh context, picks new attack angles, and requires file:line evidence for every finding. Seven ODC trigger categories must each be tested, and consensus voting between independent agents auto-upgrades confidence when two agents flag the same issue.

The 95 vs 73 performance gap is significant because it demonstrates that coordination overhead is more than compensated by specialization benefits. This is consistent with the general finding that coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem — the gains come from structuring how agents interact, not from making individual agents more capable.

The continual learning component is particularly important for knowledge base applications. In a knowledge validation pipeline, conventions and failure patterns discovered during early reviews (e.g., "claims about mechanism design require quantitative evidence") can be injected into downstream reviewer instances, creating an improving review process without human intervention.


Relevant Notes:

Topics: