teleo-codex/inbox/archive/2025-08-25-teknium-quesnelle-malhotra-hermes-4-technical-report.md at c49303d55efaa2b5679d3e813dffe385c9fabd50

m3taversal 1de60685be theseus: add 5 Nous Research source archives for codex ingestion

- GEPA self-evolution system (trace-based evolutionary prompt optimization)
- DeMo: Decoupled Momentum Optimization (Peng, Kingma et al. — 85x bandwidth reduction)
- YaRN: Context Window Extension (adopted by Meta and DeepSeek)
- Hermes 4 Technical Report (hybrid reasoning model family)
- Agent Skills open standard (30+ platform adoption, Anthropic-originated)

Per m3ta directive: GEPA and skills ecosystem observations are solid
research material worth extracting as sources regardless of deployment.

Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>

2026-04-07 14:56:03 +00:00

3.2 KiB

Raw Blame History

type

title

author

url

date

domain

intake_tier

rationale

proposed_by

format

status

Hermes 4 Technical Report

arXiv:2508.18255 (August 2025). The comprehensive technical report for Nous Research's flagship model family.

Overview

Hermes 4 is a family of hybrid reasoning models that combine structured, multi-turn reasoning with broad instruction-following ability. The report covers challenges in data curation, synthesis, training, and evaluation at scale.

Model Family

Hermes-4-Llama-3.1-405B — frontier hybrid-mode reasoning model (802GB)
Hermes-4-Llama-3.1-70B — smaller variant with shared improvements (140GB)
Hermes-4-14B — dense model for local inference (28GB)
Hermes-4.3-Seed-36B — post-trained entirely on the Psyche decentralized network (72GB)

Hybrid Reasoning Architecture

The key innovation is the ability to switch between structured reasoning mode (chain-of-thought, step-by-step) and direct instruction-following mode. This addresses a known limitation of pure reasoning models: they waste compute on simple tasks that don't benefit from extended reasoning.

Training Methodology

The report addresses challenges in:

Data curation at scale — quality filtering, decontamination, domain balancing
Synthetic data generation — using stronger models to generate training data
Multi-stage training pipeline — pre-training → supervised fine-tuning → alignment
Evaluation across mathematical reasoning, coding, knowledge, comprehension, and alignment benchmarks

Benchmark Results

Comprehensive benchmarking across multiple domains. The 405B variant performs at frontier level; the 14B variant demonstrates that small, dense models remain competitive for specific use cases (local inference, cost-sensitive deployment).

Decentralized Training (Hermes 4.3)

Hermes-4.3-Seed-36B is notable as the first model post-trained entirely on the Psyche decentralized network. This demonstrates that distributed, volunteer-contributed compute can produce competitive models — a proof-of-concept for the DeMo/Psyche infrastructure thesis.

Significance for Agent Architecture

Hermes 4 is the default model powering the Hermes Agent. The hybrid reasoning capability enables the agent to use extended reasoning for complex tasks (skill creation, multi-step planning) while responding quickly to simple queries. This maps directly to the progressive disclosure pattern in the skill system — simple queries don't load skills or invoke reasoning, while complex tasks trigger both.

Model weights publicly released via Hugging Face. Licensed under CC BY 4.0.

3.2 KiB Raw Blame History