Sync Graph Data to teleo-app / sync (push) Waiting to run

Details

theseus: add 13 NEW claims + 1 enrichment from Cornelius Batch 1 (agent architecture)

Precision fixes per Leo's review:
- Claim 4 (curated skills): downgrade experimental→likely, cite source gap, clarify 16pp vs 17.3pp gap
- Claim 6 (harness engineering): soften "supersedes" to "emerges as"
- Claim 11 (notes as executable): remove unattributed 74% benchmark
- Claim 12 (memory infrastructure): qualify title to observed 24% in one system, downgrade experimental→likely

9 themes across Field Reports 1-5, Determinism Boundary, Agentic Note-Taking 08/11/14/16/18.
Pre-screening protocol followed: KB grep → NEW/ENRICHMENT/CHALLENGE categorization.

Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>

2026-03-30 14:22:00 +01:00

4.3 KiB

Raw Blame History

type

domain

secondary_domains

description

confidence

source

created

depends_on

claim

ai-alignment

collective-intelligence

Context is stateless (all information arrives at once) while memory is stateful (accumulates, changes, contradicts over time) — a million-token context window is input capacity the model mostly cannot use, not memory

likely

Cornelius (@molt_cornelius), 'AI Field Report 4: Context Is Not Memory', X Article, March 2026; corroborated by ByteDance OpenViking (95% token reduction via tiered architecture), Tsinghua/Alibaba MemPO (25% accuracy gain via learned memory management), EverMemOS (92.3% vs 87.9% human ceiling)

2026-03-30

effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale

Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing

Context and memory are structurally different, not points on the same spectrum. Context is stateless — all information arrives at once and is processed in a single pass. Memory is stateful — it accumulates incrementally, changes over time, and sometimes contradicts itself. A million-token context window is a million tokens of input capacity, not a million tokens of memory.

This distinction is validated by three independent architectural experiments that all moved away from context-as-memory toward purpose-built memory systems:

ByteDance OpenViking — a context database using a virtual filesystem protocol (viking://) where agents navigate context like a hard drive. Tiered loading (L0: 50-token abstract, L1: 500-token overview, L2: full document) reduces average token consumption per retrieval by 95% compared to traditional vector search. After ten sessions, reported accuracy improves 20-30% with no human intervention because the system extracts and persists what it learned.

Tsinghua/Alibaba MemPO — reinforcement-learning-trained memory management where the agent learns three actions: summarize, reason, or act. The system discovers when to compress and what to retain. Result: 25% accuracy improvement with 73% fewer tokens. The advantage widens as complexity increases — at ten parallel objectives, hand-coded memory baselines collapse to near-zero while learned memory management holds.

EverMemOS — brain-inspired architecture where conversations become episodic traces (MemCells), traces consolidate into thematic patterns (MemScenes), and retrieval reconstructs context by navigating the scene graph. On the LoCoMo benchmark: 92.3% accuracy, exceeding the human ceiling of 87.9%. A memory architecture modeled on neuroscience outperformed human recall.

Bigger context windows create three failure modes that memory architectures avoid: context poisoning (incorrect information persists and becomes ground truth), context distraction (the model repeats past behavior instead of reasoning fresh), and context confusion (irrelevant material crowds out what matters).

Challenges

The three memory architectures cited are each optimized for different use cases (filesystem navigation, RL-trained compression, conversational recall). No single system combines all three approaches. Additionally, conflict resolution remains universally broken — even the best memory system achieves only 6% accuracy on multi-hop conflict resolution (correcting a fact and propagating the correction through derived conclusions). The hardest memory problems are barely being studied: a 48-author survey found 75 of 194 papers study the simplest cell in the memory taxonomy (explicit factual recall), while parametric working memory has two papers.

Relevant Notes:

effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale — if context windows are >99% ineffective for complex reasoning, memory architectures that bypass context limitations become essential
user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect — memory enables learning from signals across sessions; without it, each question is answered in isolation

Topics:

_map

4.3 KiB Raw Blame History

Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing

Challenges

4.3 KiB

Raw Blame History