m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected

Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 11:55:18 +01:00

6.5 KiB

Raw Blame History

type

domain

secondary_domains

description

confidence

source

created

depends_on

reweave_edges

sourced_from

claim

ai-alignment

living-agents

Three eras — prompt engineering (model is the product), context engineering (information environment matters), harness engineering (the compound runtime system wrapping the model is the product and moat) — where model commoditization makes the harness the durable competitive layer

likely

Cornelius (@molt_cornelius), 'AI Field Report 1: The Harness Is the Product', X Article, March 2026; corroborated by OpenDev technical report (81 pages, first open-source harness architecture), Anthropic harness engineering guide, swyx vocabulary shift, OpenAI 'Harness Engineering' post

2026-03-30

the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load

effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale

harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure

harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks

file-backed durable state is the most consistently positive harness module across task types because externalizing state to path-addressable artifacts survives context truncation delegation and restart

ai-agents-shift-research-bottleneck-from-execution-to-ideation-because-agents-implement-well-scoped-ideas-but-fail-at-creative-experiment-design

harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks|related|2026-04-03

inbox/archive/2026-03-13-cornelius-field-report-1-harness.md

Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do

Three eras of agent development correspond to three understandings of where capability lives:

Prompt engineering — the model is the product. Give it better instructions, get better output.
Context engineering — the entire information environment matters. Manage system rules, retrieved documents, tool schemas, conversation history. Find the smallest set of high-signal tokens that maximize desired outcomes.
Harness engineering — the compound runtime system wrapping the model is the product. The model is commodity infrastructure; the harness — context architecture, skill definitions, hook enforcement, memory design, safety layers, validation loops — is what creates a specific product that does a specific thing well.

The transition from context to harness engineering is not semantic — it reflects a structural distinction first published in OpenDev's 81-page technical report: scaffolding (everything assembled before the first prompt — system prompts compiled, tool schemas built, sub-agents registered) versus harness (runtime orchestration after — tool dispatch, context compaction, safety enforcement, memory persistence, cross-turn state). Scaffolding optimizes for cold-start latency; harness optimizes for long-session survival. Conflating them means neither gets optimized well.

OpenDev's architecture demonstrates what a production harness contains: five model roles (execution, thinking, critique, visual, compaction), four context engineering subsystems (dynamic priority-ordered system prompts, tool result offloading, dual-memory architecture, five-stage adaptive compaction), and a five-layer safety architecture where each layer operates independently. Anthropic independently published the complementary pattern: initializer + coding agent split, where a JSON coordination artifact persists through context resets.

The convergence validates model commoditization. Claude, GPT, Gemini are three names for the same class of capability. Same model, different harness, different product. OpenAI published their own post titled "Harness Engineering" the same week — the vocabulary has been adopted by the labs themselves.

Challenges

The harness-as-moat thesis assumes model commoditization, which is true at the margin but not at the frontier. When a new capability leap occurs (reasoning models, multimodal models), the harness must adapt to the new model class. The ETH Zurich finding that context files reduce task success rates for scoped coding tasks suggests the harness advantage is altitude-dependent: for bounded single-agent tasks, minimal harness wins. The 2,000-line context file Cornelius runs on has no published benchmarks against the 60-line minimalist approach — the research gap on system-scoped vs task-scoped agents is unresolved.

Relevant Notes:

the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load — hooks are the enforcement layer of the harness; without deterministic enforcement, the harness is just a longer prompt
effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale — the harness exists partly to compensate for context window limitations; if windows worked as advertised, simpler architectures would suffice
coding-agents-crossed-usability-threshold-december-2025-when-models-achieved-sustained-coherence-across-complex-multi-file-tasks — the usability threshold was a model capability event; the harness engineering era begins after that threshold, when the model is no longer the bottleneck

Topics:

_map

6.5 KiB Raw Blame History

Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do

Challenges

6.5 KiB

Raw Blame History