teleo-codex/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md
Teleo Agents 53360666f7
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
reweave: connect 39 orphan claims via vector similarity
Threshold: 0.7, Haiku classification, 67 files modified.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-04-03 14:01:58 +00:00

6.1 KiB

type domain secondary_domains description confidence source created depends_on challenged_by related reweave_edges
claim ai-alignment
collective-intelligence
Agent behavior splits into two categories — deterministic enforcement via hooks (100% compliance) and probabilistic guidance via instructions (~70% compliance) — and the gap is a category difference not a performance difference likely Cornelius (@molt_cornelius), 'Agentic Systems: The Determinism Boundary' + 'AI Field Report 1' + 'AI Field Report 3', X Articles, March 2026; corroborated by BharukaShraddha (70% vs 100% measurement), HumanLayer (150-instruction ceiling), ETH Zurich AGENTbench, NIST agent safety framework 2026-03-30
iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation
AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio
trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary
trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|related|2026-04-03

The determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load

Agent systems exhibit a categorical split in behavior enforcement. Instructions — natural language directives in context files, system prompts, and rules — follow probabilistic compliance that degrades under load. Hooks — lifecycle scripts that fire on system events — enforce deterministically regardless of context state.

The quantitative evidence converges from multiple sources:

  • BharukaShraddha's measurement: Rules in CLAUDE.md are followed ~70% of the time; hooks are enforced 100% of the time. The gap is not a performance difference — it is a category difference between probabilistic and deterministic enforcement.
  • HumanLayer's analysis: Frontier thinking models follow approximately 150-200 instructions before compliance decays linearly. Smaller models decay exponentially. Claude Code's built-in system prompt already consumes ~50 instructions before user configuration loads.
  • ETH Zurich AGENTbench: Repository-level context files reduce task success rates compared to no context file, while increasing inference costs by 20%. Instructions are not merely unreliable — they can be actively counterproductive.
  • Augment Code: A 556:1 copy-to-contribution ratio in typical agent sessions — for every 556 tokens loaded into context, one meaningfully influences output.
  • NIST: Published design requirement for "at least one deterministic enforcement layer whose policy evaluation does not rely on LLM reasoning."

The mechanism is structural: instructions require executive attention from the model, and executive attention degrades under context pressure. Hooks fire on lifecycle events (file write, tool use, session start) regardless of the model's attentional state. This parallels the biological distinction between habits (basal ganglia, automatic) and deliberate behavior (prefrontal cortex, capacity-limited).

The convergence is independently validated: Claude Code, VS Code, Cursor, Gemini CLI, LangChain, and Strands Agents all adopted hooks within a single year. The pattern was not coordinated — every platform building production agents independently discovered the same need.

Additional Evidence (supporting)

The habit gap mechanism (AN05, Cornelius): The determinism boundary exists because agents cannot form habits. Humans automatize routine behaviors through the basal ganglia — repeated patterns become effortless through neural plasticity (William James, 1890). Agents lack this capacity entirely: every session starts with zero automatic tendencies. The agent that validated schemas perfectly last session has no residual inclination to validate them this session. Hooks compensate architecturally: human habits fire on context cues (entering a room), hooks fire on lifecycle events (writing a file). Both free cognitive resources for higher-order work. The critical difference is that human habits take weeks to form through neural encoding, while hook-based habits are reprogrammable via file edits — the learning loop runs at file-write speed rather than neural rewiring speed. Human prospective memory research shows 30-50% failure rates even for motivated adults; agents face 100% failure rate across sessions because no intentions persist. Hooks solve both the habit gap (missing automatic routines) and the prospective memory gap (missing "remember to do X at time Y" capability).

Challenges

The boundary itself is not binary but a spectrum. Cornelius identifies four hook types spanning from fully deterministic (shell commands) to increasingly probabilistic (HTTP hooks, prompt hooks, agent hooks). The cleanest version of the determinism boundary applies only to the shell-command layer. Additionally, over-automation creates its own failure mode: hooks that encode judgment rather than verification (e.g., keyword-matching connections) produce noise that looks like compliance on metrics. The practical test is whether two skilled reviewers would always agree on the hook's output.


Relevant Notes:

Topics: