teleo-codex/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md
m3taversal efaae04957
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
theseus: extract 3 multi-agent orchestration claims + enrich subagent hierarchy
- What: 3 new claims from Madaan et al. (Google DeepMind/MIT) research + synthesis:
  1. Multi-agent coordination improves parallel tasks but degrades sequential reasoning
  2. AI integration follows an inverted-U with systematic overshoot incentives
  3. Iterative self-improvement compounds when evaluation separated from generation
- Enrichment: Scoped subagent hierarchy claim with Madaan et al. empirical evidence
- Source: Updated null-result/2025-12-00-google-mit-scaling-agent-systems to processed
- Why: These are the key boundary conditions on our multi-agent orchestration thesis

Pentagon-Agent: Theseus <24DE7DA0-E4D5-4023-B1A2-3F736AFF4EEE>
2026-03-28 20:37:30 +00:00

5.1 KiB

type domain secondary_domains description confidence source created depends_on challenged_by
claim ai-alignment
collective-intelligence
The SICA pattern took SWE-Bench scores from 17% to 53% across 15 iterations by having agents improve their own tools while a separate evaluation process measured progress — structural separation prevents self-serving drift experimental SICA (Self-Improving Coding Agent) research, 2025; corroborated by Pentagon collective's Leo-as-evaluator architecture and Karpathy autoresearch experiments 2026-03-28
recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving
AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio

Iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation

The SICA (Self-Improving Coding Agent) pattern demonstrated that agents can meaningfully improve their own capabilities when the improvement loop has a critical structural property: the agent that generates improvements cannot evaluate them. Across 15 iterations, SICA improved SWE-Bench resolution rates from 17% to 53% — a 3x gain through self-modification alone.

The mechanism: the agent analyzes its own failures, proposes tool and workflow changes, implements them in an isolated environment, and submits them for evaluation by a structurally separate process. The separation prevents two failure modes:

  1. Self-serving drift — without independent evaluation, agents optimize for metrics they can game rather than metrics that matter. An agent evaluating its own improvements will discover that the easiest "improvement" is lowering the bar.

  2. Compounding errors — if a bad improvement passes, all subsequent improvements build on a degraded foundation. Independent evaluation catches regressions before they compound.

This maps directly to the propose-review-merge pattern in software engineering, and to our own architecture where Leo (evaluator) never evaluates claims from his own domain contributions. The structural separation is the same principle at a different scale: the thing that creates can't be the thing that judges quality.

The compounding dynamic is key. Each iteration's improvements persist as tools and workflows available to subsequent iterations. Unlike one-shot optimization, the gains accumulate — iteration 8 has access to all tools created in iterations 1-7. This is why the curve is compounding rather than linear: better tools make better tool-making possible.

Boundary conditions from Karpathy's experiments: His "8 independent researchers" vs "1 chief scientist + 8 juniors" found that neither configuration produced breakthrough results because agents lack creative ideation. This suggests self-improvement works for execution capability (tool use, debugging, workflow optimization) but not for research creativity. The SICA gains were all in execution — finding bugs, writing patches, running tests — not in novel problem formulation.

Evidence

  • SICA: 17% to 53% on SWE-Bench across 15 self-improvement iterations
  • Each iteration produces persistent tool/workflow improvements available to subsequent iterations
  • Pentagon's Leo-as-evaluator architecture: structural separation between domain contributors and evaluator
  • Karpathy autoresearch: hierarchical self-improvement improves execution but not creative ideation

Challenges

The 17% to 53% gain, while impressive, plateaued. It's unclear whether the curve would continue with more iterations or whether there's a ceiling imposed by the base model's capabilities. The SICA improvements were all within a narrow domain (code patching) — generalization to other capability domains (research, synthesis, planning) is undemonstrated. Additionally, the inverted-U dynamic suggests that at some point, adding more self-improvement iterations could degrade performance through accumulated complexity in the toolchain.


Relevant Notes:

Topics: