- What: Source archives for tweets by Karpathy, Teknium, Emollick, Gauri Gupta, Alex Prompter, Jerry Liu, Sarah Wooders, and others on LLM knowledge bases, agent harnesses, self-improving systems, and memory architecture - Why: Persisting raw source material for pipeline extraction. 4 sources already processed by Rio's batch (karpathy-gist, kevin-gu, mintlify, hyunjin-kim) were excluded as duplicates. - Status: all unprocessed, ready for overnight extraction pipeline Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>
25 lines
1 KiB
Markdown
25 lines
1 KiB
Markdown
---
|
|
type: source
|
|
title: "Stanford Meta-Harness: Biggest Performance Gap Is the Harness"
|
|
author: "alex_prompter (@alex_prompter)"
|
|
url: "https://x.com/alex_prompter/status/2040378405322113442"
|
|
date: 2026-04-04
|
|
domain: ai-alignment
|
|
format: tweet
|
|
status: unprocessed
|
|
tags: [harness, meta-harness, stanford, agent-optimization, benchmark]
|
|
---
|
|
|
|
## Content
|
|
|
|
Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. +7.7 points. 4x fewer tokens. #1 ranking
|
|
|
|
613 likes, 32 replies. Contains research visualization image.
|
|
|
|
## Key Points
|
|
|
|
- Stanford research shows the harness (code wrapping the model) matters more than the model itself
|
|
- Built a system that automatically writes better harnesses than human-crafted ones
|
|
- Achieved +7.7 point improvement with 4x fewer tokens
|
|
- Reached #1 ranking on benchmark
|
|
- Key implication: optimizing the harness is higher leverage than optimizing the model
|