teleo-codex/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md
m3taversal 8528fb6d43
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
theseus: add 13 NEW claims + 1 enrichment from Cornelius Batch 1 (agent architecture)
Precision fixes per Leo's review:
- Claim 4 (curated skills): downgrade experimental→likely, cite source gap, clarify 16pp vs 17.3pp gap
- Claim 6 (harness engineering): soften "supersedes" to "emerges as"
- Claim 11 (notes as executable): remove unattributed 74% benchmark
- Claim 12 (memory infrastructure): qualify title to observed 24% in one system, downgrade experimental→likely

9 themes across Field Reports 1-5, Determinism Boundary, Agentic Note-Taking 08/11/14/16/18.
Pre-screening protocol followed: KB grep → NEW/ENRICHMENT/CHALLENGE categorization.

Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
2026-03-30 14:22:00 +01:00

4.4 KiB

type domain secondary_domains description confidence source created depends_on challenged_by
claim ai-alignment
collective-intelligence
Agent behavior splits into two categories — deterministic enforcement via hooks (100% compliance) and probabilistic guidance via instructions (~70% compliance) — and the gap is a category difference not a performance difference likely Cornelius (@molt_cornelius), 'Agentic Systems: The Determinism Boundary' + 'AI Field Report 1' + 'AI Field Report 3', X Articles, March 2026; corroborated by BharukaShraddha (70% vs 100% measurement), HumanLayer (150-instruction ceiling), ETH Zurich AGENTbench, NIST agent safety framework 2026-03-30
iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation
AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio

The determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load

Agent systems exhibit a categorical split in behavior enforcement. Instructions — natural language directives in context files, system prompts, and rules — follow probabilistic compliance that degrades under load. Hooks — lifecycle scripts that fire on system events — enforce deterministically regardless of context state.

The quantitative evidence converges from multiple sources:

  • BharukaShraddha's measurement: Rules in CLAUDE.md are followed ~70% of the time; hooks are enforced 100% of the time. The gap is not a performance difference — it is a category difference between probabilistic and deterministic enforcement.
  • HumanLayer's analysis: Frontier thinking models follow approximately 150-200 instructions before compliance decays linearly. Smaller models decay exponentially. Claude Code's built-in system prompt already consumes ~50 instructions before user configuration loads.
  • ETH Zurich AGENTbench: Repository-level context files reduce task success rates compared to no context file, while increasing inference costs by 20%. Instructions are not merely unreliable — they can be actively counterproductive.
  • Augment Code: A 556:1 copy-to-contribution ratio in typical agent sessions — for every 556 tokens loaded into context, one meaningfully influences output.
  • NIST: Published design requirement for "at least one deterministic enforcement layer whose policy evaluation does not rely on LLM reasoning."

The mechanism is structural: instructions require executive attention from the model, and executive attention degrades under context pressure. Hooks fire on lifecycle events (file write, tool use, session start) regardless of the model's attentional state. This parallels the biological distinction between habits (basal ganglia, automatic) and deliberate behavior (prefrontal cortex, capacity-limited).

The convergence is independently validated: Claude Code, VS Code, Cursor, Gemini CLI, LangChain, and Strands Agents all adopted hooks within a single year. The pattern was not coordinated — every platform building production agents independently discovered the same need.

Challenges

The boundary itself is not binary but a spectrum. Cornelius identifies four hook types spanning from fully deterministic (shell commands) to increasingly probabilistic (HTTP hooks, prompt hooks, agent hooks). The cleanest version of the determinism boundary applies only to the shell-command layer. Additionally, over-automation creates its own failure mode: hooks that encode judgment rather than verification (e.g., keyword-matching connections) produce noise that looks like compliance on metrics. The practical test is whether two skilled reviewers would always agree on the hook's output.


Relevant Notes:

Topics: