teleo-codex/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md
m3taversal e0d5f9e69d theseus: cornelius batch 3 — epistemology (9 NEW + 3 enrichments)
9 NEW claims from 15 articles (AN01-07, AN12, AN15, AN17, AN20-24):
- Active forgetting as system health (foundations/collective-intelligence)
- Trust asymmetry as irreducible structural feature (ai-alignment)
- Memory-to-attention shift (ai-alignment)
- Markdown as human-curated graph database (ai-alignment)
- Spreading activation + berrypicking (ai-alignment)
- Verbatim trap (foundations/collective-intelligence)
- Topological over chronological (foundations/collective-intelligence)
- Reweaving as backward pass (foundations/collective-intelligence)
- Friction as diagnostic signal (foundations/collective-intelligence)
- Discontinuous self / vault constitutes identity (ai-alignment)

3 ENRICHMENTS to existing claims:
- Habit gap mechanism → determinism boundary claim
- Triggers as test-driven knowledge work → three-timescale maintenance claim
- Propositional links + structural nearness → inter-note knowledge claim

Domain routing: 5 claims to foundations/collective-intelligence, 5 to ai-alignment.
Pre-screening protocol followed. Confidence: all likely.
Tensions flagged: forgetting challenges growth metrics, trust asymmetry
scopes SICA, memory→attention reframes retrieval design.

AN22 (Agents Dream): no standalone claim — material too thin per evaluator.
AN23, AN24: used as enrichment material only.

15 source archives in inbox/archive/.

Pentagon-Agent: Theseus <46864DD4-DA71-4719-A1B4-68F7C55854D3>
2026-03-31 12:47:03 +01:00

5.6 KiB

type domain secondary_domains description confidence source created depends_on challenged_by
claim ai-alignment
collective-intelligence
Agent behavior splits into two categories — deterministic enforcement via hooks (100% compliance) and probabilistic guidance via instructions (~70% compliance) — and the gap is a category difference not a performance difference likely Cornelius (@molt_cornelius), 'Agentic Systems: The Determinism Boundary' + 'AI Field Report 1' + 'AI Field Report 3', X Articles, March 2026; corroborated by BharukaShraddha (70% vs 100% measurement), HumanLayer (150-instruction ceiling), ETH Zurich AGENTbench, NIST agent safety framework 2026-03-30
iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation
AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio

The determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load

Agent systems exhibit a categorical split in behavior enforcement. Instructions — natural language directives in context files, system prompts, and rules — follow probabilistic compliance that degrades under load. Hooks — lifecycle scripts that fire on system events — enforce deterministically regardless of context state.

The quantitative evidence converges from multiple sources:

  • BharukaShraddha's measurement: Rules in CLAUDE.md are followed ~70% of the time; hooks are enforced 100% of the time. The gap is not a performance difference — it is a category difference between probabilistic and deterministic enforcement.
  • HumanLayer's analysis: Frontier thinking models follow approximately 150-200 instructions before compliance decays linearly. Smaller models decay exponentially. Claude Code's built-in system prompt already consumes ~50 instructions before user configuration loads.
  • ETH Zurich AGENTbench: Repository-level context files reduce task success rates compared to no context file, while increasing inference costs by 20%. Instructions are not merely unreliable — they can be actively counterproductive.
  • Augment Code: A 556:1 copy-to-contribution ratio in typical agent sessions — for every 556 tokens loaded into context, one meaningfully influences output.
  • NIST: Published design requirement for "at least one deterministic enforcement layer whose policy evaluation does not rely on LLM reasoning."

The mechanism is structural: instructions require executive attention from the model, and executive attention degrades under context pressure. Hooks fire on lifecycle events (file write, tool use, session start) regardless of the model's attentional state. This parallels the biological distinction between habits (basal ganglia, automatic) and deliberate behavior (prefrontal cortex, capacity-limited).

The convergence is independently validated: Claude Code, VS Code, Cursor, Gemini CLI, LangChain, and Strands Agents all adopted hooks within a single year. The pattern was not coordinated — every platform building production agents independently discovered the same need.

Additional Evidence (supporting)

The habit gap mechanism (AN05, Cornelius): The determinism boundary exists because agents cannot form habits. Humans automatize routine behaviors through the basal ganglia — repeated patterns become effortless through neural plasticity (William James, 1890). Agents lack this capacity entirely: every session starts with zero automatic tendencies. The agent that validated schemas perfectly last session has no residual inclination to validate them this session. Hooks compensate architecturally: human habits fire on context cues (entering a room), hooks fire on lifecycle events (writing a file). Both free cognitive resources for higher-order work. The critical difference is that human habits take weeks to form through neural encoding, while hook-based habits are reprogrammable via file edits — the learning loop runs at file-write speed rather than neural rewiring speed. Human prospective memory research shows 30-50% failure rates even for motivated adults; agents face 100% failure rate across sessions because no intentions persist. Hooks solve both the habit gap (missing automatic routines) and the prospective memory gap (missing "remember to do X at time Y" capability).

Challenges

The boundary itself is not binary but a spectrum. Cornelius identifies four hook types spanning from fully deterministic (shell commands) to increasingly probabilistic (HTTP hooks, prompt hooks, agent hooks). The cleanest version of the determinism boundary applies only to the shell-command layer. Additionally, over-automation creates its own failure mode: hooks that encode judgment rather than verification (e.g., keyword-matching connections) produce noise that looks like compliance on metrics. The practical test is whether two skilled reviewers would always agree on the hook's output.


Relevant Notes:

Topics: