teleo-codex/inbox/archive/2026-04-04-alex_prompter-stanford-meta-harness.md at 1d14aab0afc447e5a8be53b3933eb20a7300a308

m3taversal 00119feb9e leo: archive 19 tweet sources on AI agents, memory, and harnesses

- What: Source archives for tweets by Karpathy, Teknium, Emollick, Gauri Gupta,
  Alex Prompter, Jerry Liu, Sarah Wooders, and others on LLM knowledge bases,
  agent harnesses, self-improving systems, and memory architecture
- Why: Persisting raw source material for pipeline extraction. 4 sources already
  processed by Rio's batch (karpathy-gist, kevin-gu, mintlify, hyunjin-kim)
  were excluded as duplicates.
- Status: all unprocessed, ready for overnight extraction pipeline

Pentagon-Agent: Leo <D35C9237-A739-432E-A3DB-20D52D1577A9>

2026-04-05 19:50:34 +01:00

1 KiB

Raw Blame History

type

title

author

url

date

domain

format

status

Content

Holy shit. Stanford just showed that the biggest performance gap in AI systems isn't the model it's the harness. The code wrapping the model. And they built a system that writes better harnesses automatically than humans can by hand. +7.7 points. 4x fewer tokens. #1 ranking

613 likes, 32 replies. Contains research visualization image.

Key Points

Stanford research shows the harness (code wrapping the model) matters more than the model itself
Built a system that automatically writes better harnesses than human-crafted ones
Achieved +7.7 point improvement with 4x fewer tokens
Reached #1 ranking on benchmark
Key implication: optimizing the harness is higher leverage than optimizing the model

1 KiB Raw Blame History

Content

Key Points

1 KiB

Raw Blame History