teleo-codex/inbox/archive/2021-03-00-sajid-active-inference-demystified-compared.md at 109c7230421197c7ab7367526538fadcf2ee7c64

Theseus 78615e2b8d theseus: extract claims from 2021-03-00-sajid-active-inference-demystified-compared (#139 )

Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>

2026-03-10 18:29:01 +00:00

5.9 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Published in Neural Computation, Vol 33(3):674-712, 2021. Also available on arXiv: https://arxiv.org/abs/1909.10863

Key Arguments

Epistemic exploration as natural behavior: Active inference agents naturally conduct epistemic exploration — uncertainty-reducing behavior — without this being engineered as a separate mechanism. In RL, exploration must be bolted on (epsilon-greedy, UCB, etc.). In active inference, it's intrinsic.
Reward-free learning: Active inference removes the reliance on an explicit reward signal. Reward is simply treated as "another observation the agent has a preference over." This reframes the entire optimization target from reward maximization to model evidence maximization (self-evidencing).
Expected Free Energy (EFE) decomposition: The EFE decomposes into:
- Epistemic value (information gain / intrinsic value): How much would this action reduce uncertainty about hidden states?
- Pragmatic value (extrinsic value / expected utility): How much does the expected outcome align with preferences? Minimizing EFE simultaneously maximizes both — resolving the explore-exploit dilemma.
Automatic explore-exploit resolution: "Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value." The agent naturally transitions from exploration to exploitation as uncertainty is reduced.
Discrete state-space formulation: The paper provides an accessible discrete-state comparison between active inference and RL on OpenAI gym baselines, demonstrating that active inference agents can infer behaviors in reward-free environments that Q-learning and Bayesian model-based RL agents cannot.

Agent Notes

Why this matters: The EFE decomposition is the key to operationalizing active inference for our agents. Epistemic value = "how much would researching this topic reduce our KB uncertainty?" Pragmatic value = "how much does this align with our mission objectives?" An agent should research topics that score high on BOTH — but epistemic value should dominate when the KB is sparse.

What surprised me: The automatic explore-exploit transition. As an agent's domain matures (more proven/likely claims, denser wiki-link graph), epistemic value for further research in that domain naturally decreases, and the agent should shift toward exploitation (enriching existing claims, building positions) rather than exploration (new source ingestion). This is exactly what we want but haven't formalized.

KB connections:

coordination protocol design produces larger capability gains than model scaling — active inference as the coordination protocol that resolves explore-exploit without engineering
structured exploration protocols reduce human intervention by 6x — the Residue prompt as an informal active inference protocol (seek surprise, not confirmation)
fitness landscape ruggedness determines whether adaptive systems find good solutions — epistemic value drives exploration of rugged fitness landscapes; pragmatic value drives exploitation of smooth ones

Operationalization angle:

Research direction scoring: Score candidate research topics by: (a) epistemic value — how many experimental/speculative claims does this topic have? How sparse are the wiki links? (b) pragmatic value — how relevant is this to current objectives and user questions?
Automatic explore-exploit: New agents (sparse KB) should explore broadly. Mature agents (dense KB) should exploit deeply. The metric is claim graph density + confidence distribution.
Surprise-weighted extraction: When extracting claims, weight contradictions to existing beliefs HIGHER than confirmations — they have higher epistemic value. A source that surprises is more valuable than one that confirms.
Preference as observation: Don't hard-code research priorities. Treat Cory's directives and user questions as observations the agent has preferences over — they shape pragmatic value without overriding epistemic value.

Extraction hints:

CLAIM: Active inference resolves the exploration-exploitation dilemma automatically because expected free energy decomposes into epistemic value (information gain) and pragmatic value (preference alignment), with exploration naturally transitioning to exploitation as uncertainty reduces
CLAIM: Active inference agents outperform reinforcement learning agents in reward-free environments because they can pursue epistemic value (uncertainty reduction) without requiring external reward signals
CLAIM: Surprise-seeking is intrinsic to active inference and does not need to be engineered as a separate exploration mechanism, unlike reinforcement learning where exploration must be explicitly added

Curator Notes

PRIMARY CONNECTION: "biological systems minimize free energy to maintain their states and resist entropic decay" WHY ARCHIVED: Provides the formal framework for operationalizing explore-exploit in our agent architecture — the EFE decomposition maps directly to research direction selection EXTRACTION HINT: Focus on the EFE decomposition and the automatic explore-exploit transition — these are immediately implementable as research direction selection criteria

5.9 KiB Raw Blame History

Content

Key Arguments

Agent Notes

Curator Notes

5.9 KiB

Raw Blame History