Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Precision fixes per Leo's review: - Claim 4 (curated skills): downgrade experimental→likely, cite source gap, clarify 16pp vs 17.3pp gap - Claim 6 (harness engineering): soften "supersedes" to "emerges as" - Claim 11 (notes as executable): remove unattributed 74% benchmark - Claim 12 (memory infrastructure): qualify title to observed 24% in one system, downgrade experimental→likely 9 themes across Field Reports 1-5, Determinism Boundary, Agentic Note-Taking 08/11/14/16/18. Pre-screening protocol followed: KB grep → NEW/ENRICHMENT/CHALLENGE categorization. Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
38 lines
4.3 KiB
Markdown
38 lines
4.3 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
secondary_domains: [collective-intelligence]
|
|
description: "Context is stateless (all information arrives at once) while memory is stateful (accumulates, changes, contradicts over time) — a million-token context window is input capacity the model mostly cannot use, not memory"
|
|
confidence: likely
|
|
source: "Cornelius (@molt_cornelius), 'AI Field Report 4: Context Is Not Memory', X Article, March 2026; corroborated by ByteDance OpenViking (95% token reduction via tiered architecture), Tsinghua/Alibaba MemPO (25% accuracy gain via learned memory management), EverMemOS (92.3% vs 87.9% human ceiling)"
|
|
created: 2026-03-30
|
|
depends_on:
|
|
- "effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale"
|
|
---
|
|
|
|
# Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing
|
|
|
|
Context and memory are structurally different, not points on the same spectrum. Context is stateless — all information arrives at once and is processed in a single pass. Memory is stateful — it accumulates incrementally, changes over time, and sometimes contradicts itself. A million-token context window is a million tokens of input capacity, not a million tokens of memory.
|
|
|
|
This distinction is validated by three independent architectural experiments that all moved away from context-as-memory toward purpose-built memory systems:
|
|
|
|
**ByteDance OpenViking** — a context database using a virtual filesystem protocol (viking://) where agents navigate context like a hard drive. Tiered loading (L0: 50-token abstract, L1: 500-token overview, L2: full document) reduces average token consumption per retrieval by 95% compared to traditional vector search. After ten sessions, reported accuracy improves 20-30% with no human intervention because the system extracts and persists what it learned.
|
|
|
|
**Tsinghua/Alibaba MemPO** — reinforcement-learning-trained memory management where the agent learns three actions: summarize, reason, or act. The system discovers when to compress and what to retain. Result: 25% accuracy improvement with 73% fewer tokens. The advantage widens as complexity increases — at ten parallel objectives, hand-coded memory baselines collapse to near-zero while learned memory management holds.
|
|
|
|
**EverMemOS** — brain-inspired architecture where conversations become episodic traces (MemCells), traces consolidate into thematic patterns (MemScenes), and retrieval reconstructs context by navigating the scene graph. On the LoCoMo benchmark: 92.3% accuracy, exceeding the human ceiling of 87.9%. A memory architecture modeled on neuroscience outperformed human recall.
|
|
|
|
Bigger context windows create three failure modes that memory architectures avoid: **context poisoning** (incorrect information persists and becomes ground truth), **context distraction** (the model repeats past behavior instead of reasoning fresh), and **context confusion** (irrelevant material crowds out what matters).
|
|
|
|
## Challenges
|
|
|
|
The three memory architectures cited are each optimized for different use cases (filesystem navigation, RL-trained compression, conversational recall). No single system combines all three approaches. Additionally, conflict resolution remains universally broken — even the best memory system achieves only 6% accuracy on multi-hop conflict resolution (correcting a fact and propagating the correction through derived conclusions). The hardest memory problems are barely being studied: a 48-author survey found 75 of 194 papers study the simplest cell in the memory taxonomy (explicit factual recall), while parametric working memory has two papers.
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale]] — if context windows are >99% ineffective for complex reasoning, memory architectures that bypass context limitations become essential
|
|
- [[user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect]] — memory enables learning from signals across sessions; without it, each question is answered in isolation
|
|
|
|
Topics:
|
|
- [[_map]]
|