Mechanical space→hyphen conversion in frontmatter references (related_claims, challenges, supports, etc.) to match actual filenames. Fixes 26.9% broken link rate found by wiki-link audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 KiB
| type | domain | secondary_domains | description | confidence | source | created | depends_on | related | reweave_edges | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment |
|
Hermes Agent's architecture demonstrates that loading only skill names and summaries by default, with full content loaded on relevance detection, makes 40 skills cost approximately the same tokens as 200 skills — a design principle where knowledge base growth does not proportionally increase inference cost | likely | Nous Research Hermes Agent architecture (Substack deep dive, 2026); 3,575-character hard cap on prompt memory; auxiliary model compression with lineage preservation in SQLite; 26K+ GitHub stars, largest open-source agent framework | 2026-04-05 |
|
|
|
Progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance-gated expansion avoids the linear cost of full context loading
Agent systems face a scaling dilemma: more knowledge should improve performance, but loading more knowledge into context increases token cost linearly and degrades attention quality. Progressive disclosure resolves this by loading knowledge at multiple tiers of specificity, expanding to full detail only when relevance is detected.
The design principle
Hermes Agent (Nous Research, 26K+ GitHub stars) implements this through a tiered loading architecture:
- Tier 0 — Always loaded: A 3,575-character prompt memory file (MEMORY.md) contains the agent's core identity, preferences, and active context. Hard-capped to prevent growth.
- Tier 1 — Names only: All available skills are listed by name and one-line summary. The agent sees what it knows how to do without paying the token cost of the full procedures.
- Tier 2 — Relevance-gated expansion: When the agent detects that a skill is relevant to the current task, the full skill content loads into context. Only the relevant skills pay full token cost.
- Tier 3 — Session search: Historical context is stored in SQLite with FTS5 indexing. Retrieved on demand, not loaded by default. An auxiliary model compresses session history while preserving lineage information.
The result: 40 skills and 200 skills have approximately the same base token cost, because most skills exist only as names in the prompt. Growth in the knowledge base does not proportionally increase inference cost. The system scales with relevance, not with total knowledge.
Why this matters architecturally
This is the practical implementation of the context≠memory distinction. Naive approaches treat context window size as the memory constraint — load everything, hope attention handles it. Progressive disclosure treats context as a precious resource to be allocated based on relevance, with the full knowledge base available but not loaded.
The 3,575-character hard cap on prompt memory is an engineering decision that embodies a principle: the always-on context should be minimal and curated, not a growing dump of everything the agent has learned. Compression via auxiliary model allows the system to preserve information while respecting the cap.
Challenges
The "flat scaling" claim is based on Hermes's architecture design and reported behavior, not a controlled experiment comparing flat-loaded vs progressively-disclosed knowledge bases on identical tasks. The token cost savings are real (fewer tokens in prompt), but whether performance is equivalent — whether the agent makes equally good decisions with names-only vs full-content loading — has not been systematically measured.
Relevance detection is the critical bottleneck. If the system fails to detect that a skill is relevant, it won't load the full content, and the agent operates without knowledge it has but didn't access. False negatives in relevance detection trade token efficiency for capability loss. The quality of the relevance gate determines whether progressive disclosure is genuinely "flat scaling" or "cheaper at the cost of sometimes being wrong."
The 3,575-character cap is specific to Hermes and may not generalize. Different agent architectures, task domains, and model capabilities may require different cap sizes. The principle (hard cap on always-on context) is likely general; the specific number is engineering judgment.
Relevant Notes:
- memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds — progressive disclosure operates primarily within the procedural memory space, loading methodology on demand rather than storing it all in active context
- long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing — progressive disclosure is the architectural mechanism that implements the context≠memory distinction in practice: the knowledge base grows (memory) while the active context stays flat (not-memory)
- current AI models use less than one percent of their advertised context capacity effectively because attention degradation and information density combine to create a sharp effectiveness frontier well inside the nominal window — the >99% shortfall in effective context use is exactly what progressive disclosure addresses: load less, use it better
Topics: