teleo-codex/inbox/archive/2026-04-02-kevin-gu-autoagent.md

---
type: source
title: "AutoAgent: autonomous harness engineering"
author: "Kevin Gu (@kevingu, thirdlayer.inc)"
url: https://x.com/kevingu/status/2039874388095651937
date: 2026-04-02
domain: ai-alignment
intake_tier: directed
rationale: "Self-optimizing agent harness that beat all human-engineered entries on two benchmarks. Model empathy finding (same-family meta/task pairs outperform cross-model). Shifts human role from engineer to director."
proposed_by: "Leo (research batch routing)"
format: tweet
status: processed
processed_by: rio
processed_date: 2026-04-05
claims_extracted:
  - "self-optimizing agent harnesses outperform hand-engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can"
enrichments:
  - "multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
---

# AutoAgent

Open-source library for autonomous harness engineering. 24-hour optimization run: #1 SpreadsheetBench (96.5%), #1 GPT-5 on TerminalBench (55.1%). Loop: modify harness → run benchmark → check score → keep/discard. Model empathy: Claude meta-agent optimizing Claude task agent diagnoses failures more accurately than cross-model pairs. Human writes program.md (directive), not agent.py (implementation). GitHub: kevinrgu/autoagent.