--- type: source title: "Self-improving agentic systems with auto-evals" author: "Gauri Gupta & Ritvik Kapila (NeoSigma)" url: https://x.com/gauri__gupta/status/2039173240204243131 date: 2026-03-31 domain: ai-alignment intake_tier: directed rationale: "Four-phase self-improvement loop: failure mining → eval clustering → optimization → regression gate. Score 0.56→0.78 on fixed model. Complements AutoAgent with production-oriented approach." proposed_by: "Leo (research batch routing)" format: tweet status: processed processed_by: rio processed_date: 2026-04-05 claims_extracted: - "self-optimizing agent harnesses outperform hand-engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can" enrichments: - "multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value" --- # NeoSigma auto-harness Four-phase outer loop on production traffic: (A) failure mining from execution traces, (B) eval clustering by root cause (29+ clusters discovered automatically), (C) optimization of prompts/tools/context/workflow, (D) regression gate (≥80% on regression suite + no validation degradation). Baseline 0.560 → 0.780 after 18 batches, 96 experiments. Fixed GPT-5.4 model — gains purely from harness changes. Regression suite grew 0→17 test cases. GitHub: neosigmaai/auto-harness.