Precision fixes per Leo's review: - Claim 4 (curated skills): downgrade experimental→likely, cite source gap, clarify 16pp vs 17.3pp gap - Claim 6 (harness engineering): soften "supersedes" to "emerges as" - Claim 11 (notes as executable): remove unattributed 74% benchmark - Claim 12 (memory infrastructure): qualify title to observed 24% in one system, downgrade experimental→likely 9 themes across Field Reports 1-5, Determinism Boundary, Agentic Note-Taking 08/11/14/16/18. Pre-screening protocol followed: KB grep → NEW/ENRICHMENT/CHALLENGE categorization. Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
4.1 KiB
| type | domain | secondary_domains | description | confidence | source | created | depends_on | |||
|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment |
|
Codified Context study tracked a 108K-line production system where memory infrastructure consumed 24% of the codebase across three tiers — hot constitution, 19 domain-expert agents, and 34 cold-storage specs — with memory emerging from debugging pain not planning | likely | Codified Context study (arXiv:2602.20478), cited in Cornelius (@molt_cornelius) 'AI Field Report 4: Context Is Not Memory', X Article, March 2026 | 2026-03-30 |
|
Production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file
The Codified Context study (arXiv:2602.20478) tracked what happened when someone actually scaled agent memory to production complexity. A developer with a chemistry background — not software engineering — built a 108,000-line real-time multiplayer game across 283 sessions using a three-tier memory architecture.
Tier 1 — Hot constitution: A single markdown file loaded into every session. Code standards, naming conventions, known failure modes, routing table. About 660 lines. This is what most people think of as "agent memory."
Tier 2 — Domain-expert agents: 19 specialized agents, each carrying its own memory. A network protocol designer with 915 lines of sync and determinism knowledge. A coordinate wizard for isometric transforms. A code reviewer trained on the project's ECS patterns. Over 65% of content is domain knowledge (formulas, code patterns, symptom-cause-fix tables), not behavioral instructions. These are knowledge-bearing agents, not instruction-following agents.
Tier 3 — Cold-storage knowledge base: 34 specification documents (save system persistence rules, UI sync routing patterns, dungeon generation formulas) retrieved on demand through an MCP server.
Total memory infrastructure: 26,200 lines — 24% of the codebase. The save system spec was referenced across 74 sessions and 12 agent conversations with zero save-related bugs in four weeks. When a new networked UI feature was needed, the agent built it correctly on first attempt because routing patterns were already in memory from a different feature six weeks earlier.
The creation heuristic is the most important finding: "If debugging a particular domain consumed an extended session without resolution, it was faster to create a specialized agent and restart." Memory infrastructure did not emerge from planning. It emerged from pain.
Challenges
This is a single case study from one project type (game development). Whether the 24% ratio generalizes to other domains (web applications, data pipelines, infrastructure code) is unknown. The developer's chemistry background may have made them more receptive to systematic documentation than typical software engineers. Additionally, the 283-session count suggests significant human investment in memory curation — whether this scales or creates its own maintenance burden at larger codebase sizes is untested.
Relevant Notes:
- long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing — the Codified Context system is a production implementation of the context-is-not-memory principle: three tiers of persistent, evolving memory infrastructure rather than larger context windows
- context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching — the hot constitution (Tier 1) IS a self-referential context file; the domain-expert agents (Tier 2) are the specialized extensions it teaches the system to create
Topics: