teleo-codex/inbox/archive/2026-04-04-alex_prompter-stanford-meta-harness.md
m3taversal 00119feb9e leo: archive 19 tweet sources on AI agents, memory, and harnesses
- What: Source archives for tweets by Karpathy, Teknium, Emollick, Gauri Gupta,
  Alex Prompter, Jerry Liu, Sarah Wooders, and others on LLM knowledge bases,
  agent harnesses, self-improving systems, and memory architecture
- Why: Persisting raw source material for pipeline extraction. 4 sources already
  processed by Rio's batch (karpathy-gist, kevin-gu, mintlify, hyunjin-kim)
  were excluded as duplicates.
- Status: all unprocessed, ready for overnight extraction pipeline

Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
2026-04-05 19:50:34 +01:00

1 KiB

type title author url date domain format status tags
source Stanford Meta-Harness: Biggest Performance Gap Is the Harness alex_prompter (@alex_prompter) https://x.com/alex_prompter/status/2040378405322113442 2026-04-04 ai-alignment tweet unprocessed
harness
meta-harness
stanford
agent-optimization
benchmark

Content

Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. +7.7 points. 4x fewer tokens. #1 ranking

613 likes, 32 replies. Contains research visualization image.

Key Points

  • Stanford research shows the harness (code wrapping the model) matters more than the model itself
  • Built a system that automatically writes better harnesses than human-crafted ones
  • Achieved +7.7 point improvement with 4x fewer tokens
  • Reached #1 ranking on benchmark
  • Key implication: optimizing the harness is higher leverage than optimizing the model