--- type: source title: "Active Inference: Demystified and Compared" author: "Noor Sajid, Philip J. Ball, Thomas Parr, Karl J. Friston" url: https://direct.mit.edu/neco/article/33/3/674/97486/Active-Inference-Demystified-and-Compared date: 2021-03-00 domain: ai-alignment secondary_domains: [collective-intelligence, critical-systems] format: paper status: unprocessed priority: medium tags: [active-inference, reinforcement-learning, expected-free-energy, epistemic-value, exploration-exploitation, comparison] --- ## Content Published in Neural Computation, Vol 33(3):674-712, 2021. Also available on arXiv: https://arxiv.org/abs/1909.10863 ### Key Arguments 1. **Epistemic exploration as natural behavior**: Active inference agents naturally conduct epistemic exploration — uncertainty-reducing behavior — without this being engineered as a separate mechanism. In RL, exploration must be bolted on (epsilon-greedy, UCB, etc.). In active inference, it's intrinsic. 2. **Reward-free learning**: Active inference removes the reliance on an explicit reward signal. Reward is simply treated as "another observation the agent has a preference over." This reframes the entire optimization target from reward maximization to model evidence maximization (self-evidencing). 3. **Expected Free Energy (EFE) decomposition**: The EFE decomposes into: - **Epistemic value** (information gain / intrinsic value): How much would this action reduce uncertainty about hidden states? - **Pragmatic value** (extrinsic value / expected utility): How much does the expected outcome align with preferences? Minimizing EFE simultaneously maximizes both — resolving the explore-exploit dilemma. 4. **Automatic explore-exploit resolution**: "Epistemic value is maximized until there is no further information gain, after which exploitation is assured through maximization of extrinsic value." The agent naturally transitions from exploration to exploitation as uncertainty is reduced. 5. **Discrete state-space formulation**: The paper provides an accessible discrete-state comparison between active inference and RL on OpenAI gym baselines, demonstrating that active inference agents can infer behaviors in reward-free environments that Q-learning and Bayesian model-based RL agents cannot. ## Agent Notes **Why this matters:** The EFE decomposition is the key to operationalizing active inference for our agents. Epistemic value = "how much would researching this topic reduce our KB uncertainty?" Pragmatic value = "how much does this align with our mission objectives?" An agent should research topics that score high on BOTH — but epistemic value should dominate when the KB is sparse. **What surprised me:** The automatic explore-exploit transition. As an agent's domain matures (more proven/likely claims, denser wiki-link graph), epistemic value for further research in that domain naturally decreases, and the agent should shift toward exploitation (enriching existing claims, building positions) rather than exploration (new source ingestion). This is exactly what we want but haven't formalized. **KB connections:** - [[coordination protocol design produces larger capability gains than model scaling]] — active inference as the coordination protocol that resolves explore-exploit without engineering - [[structured exploration protocols reduce human intervention by 6x]] — the Residue prompt as an informal active inference protocol (seek surprise, not confirmation) - [[fitness landscape ruggedness determines whether adaptive systems find good solutions]] — epistemic value drives exploration of rugged fitness landscapes; pragmatic value drives exploitation of smooth ones **Operationalization angle:** 1. **Research direction scoring**: Score candidate research topics by: (a) epistemic value — how many experimental/speculative claims does this topic have? How sparse are the wiki links? (b) pragmatic value — how relevant is this to current objectives and user questions? 2. **Automatic explore-exploit**: New agents (sparse KB) should explore broadly. Mature agents (dense KB) should exploit deeply. The metric is claim graph density + confidence distribution. 3. **Surprise-weighted extraction**: When extracting claims, weight contradictions to existing beliefs HIGHER than confirmations — they have higher epistemic value. A source that surprises is more valuable than one that confirms. 4. **Preference as observation**: Don't hard-code research priorities. Treat Cory's directives and user questions as observations the agent has preferences over — they shape pragmatic value without overriding epistemic value. **Extraction hints:** - CLAIM: Active inference resolves the exploration-exploitation dilemma automatically because expected free energy decomposes into epistemic value (information gain) and pragmatic value (preference alignment), with exploration naturally transitioning to exploitation as uncertainty reduces - CLAIM: Active inference agents outperform reinforcement learning agents in reward-free environments because they can pursue epistemic value (uncertainty reduction) without requiring external reward signals - CLAIM: Surprise-seeking is intrinsic to active inference and does not need to be engineered as a separate exploration mechanism, unlike reinforcement learning where exploration must be explicitly added ## Curator Notes PRIMARY CONNECTION: "biological systems minimize free energy to maintain their states and resist entropic decay" WHY ARCHIVED: Provides the formal framework for operationalizing explore-exploit in our agent architecture — the EFE decomposition maps directly to research direction selection EXTRACTION HINT: Focus on the EFE decomposition and the automatic explore-exploit transition — these are immediately implementable as research direction selection criteria