teleo-codex/inbox/archive/2026-02-24-nous-research-hermes-agent-self-evolution-gepa.md at aedc511e292e1109e3ef6034e84a5846096fdc84

m3taversal 1de60685be theseus: add 5 Nous Research source archives for codex ingestion

- GEPA self-evolution system (trace-based evolutionary prompt optimization)
- DeMo: Decoupled Momentum Optimization (Peng, Kingma et al. — 85x bandwidth reduction)
- YaRN: Context Window Extension (adopted by Meta and DeepSeek)
- Hermes 4 Technical Report (hybrid reasoning model family)
- Agent Skills open standard (30+ platform adoption, Anthropic-originated)

Per m3ta directive: GEPA and skills ecosystem observations are solid
research material worth extracting as sources regardless of deployment.

Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>

2026-04-07 14:56:03 +00:00

4.7 KiB

Raw Blame History

type

title

author

url

date

domain

intake_tier

rationale

proposed_by

format

status

processed_by

processed_date

claims_extracted

enrichments

GEPA: Genetic-Pareto Prompt Evolution

GEPA (Genetic-Pareto Prompt Evolution) is Nous Research's evolutionary optimizer for agent self-improvement. It is implemented in the hermes-agent-self-evolution repository (704 stars, MIT license) and integrates DSPy for prompt optimization with evolutionary trace analysis.

Core Mechanism

GEPA is a reflective evolutionary optimizer that examines WHY components fail, not merely THAT they fail. The system reads execution traces to understand concrete failure modes, then proposes targeted improvements. This trace-based analysis distinguishes GEPA from simpler mutation approaches (random perturbation) and from RL-based methods (reward signal without causal explanation).

Evolutionary Process

Read current skill/prompt/tool definition
Generate evaluation dataset (synthetic or from real session history via SQLite)
Execute candidates and capture full execution traces
GEPA optimizer analyzes traces and proposes targeted mutations
Evaluate variants against 5 constraint gates
Select best performer via Pareto front
Submit as pull request for human review

Five Constraint Gates (Guardrails)

Every evolved variant must satisfy all five gates before consideration:

Full Test Suite: pytest tests/ -q must pass 100%
Size Limits: Skills ≤15KB, tool descriptions ≤500 characters
Caching Compatibility: No mid-conversation changes allowed
Semantic Preservation: Variants must not drift from original intent
PR Review: All changes go through human review, never direct commit

The fifth gate — PR-review governance — ensures no evolved variant reaches production without human approval. This is structurally equivalent to the acceptance-gating pattern in SICA (SWE-Bench self-improvement), but GEPA adds trace-based explanation of WHY the mutation was proposed.

What Gets Optimized (Phased Rollout)

Phase 1 (Implemented): Skill files (SKILL.md) — procedural memory
Phase 2 (Planned): Tool descriptions — capability interfaces
Phase 3 (Planned): System prompt sections — behavioral tuning
Phase 4 (Planned): Tool implementation code via Darwinian Evolver
Phase 5 (Planned): Continuous improvement loop

Architecture Split

The system distinguishes between:

Reflective text evolution (DSPy + GEPA) — for prompts, descriptions, skills
Code evolution (Darwinian Evolver, AGPL v3) — for implementation code

This separation applies appropriate optimization strategies per artifact type. Text evolution operates entirely via API calls — mutating natural language, evaluating results, selecting best variants. Cost: ~$2-10 per optimization run.

Integration with DSPy

DSPy provides the prompt optimization framework. GEPA adds the evolutionary trace analysis on top. Combined, they mutate natural language descriptions of skills, tool behaviors, and system instructions with causal grounding in observed failure modes.

Key Distinctions from Other Self-Improvement Approaches

Approach	Signal Type	Causal?	Governance
SICA (SWE-Bench)	Pass/fail acceptance gate	No	Metric threshold
NLAH (Pan et al.)	Module ablation	Partial	Researcher manual
GRPO (RL)	Reward signal	No	Training objective
GEPA	Execution trace analysis	Yes	5-gate + PR review

GEPA's distinguishing feature is that it reads the execution trace to understand the causal chain of failure, then proposes mutations that address the root cause rather than randomly perturbing until something works.

Development Status

Repository: 704 stars, 64 forks, 7 commits, actively under development. MIT license for core; Darwinian Evolver uses AGPL v3 as external CLI only.

4.7 KiB Raw Blame History