teleo-codex/domains/ai-alignment/active-inference-outperforms-rl-in-reward-free-environments.md
Teleo Agents c7b3093fe1 theseus: extract claims from 2021-03-00-sajid-active-inference-demystified-compared.md
- Source: inbox/archive/2021-03-00-sajid-active-inference-demystified-compared.md
- Domain: ai-alignment
- Extracted by: headless extraction cron

Pentagon-Agent: Theseus <HEADLESS>
2026-03-10 16:22:15 +00:00

2.5 KiB

type domain description confidence source created depends_on challenged_by
claim ai-alignment Active inference agents outperform reinforcement learning agents in reward-free environments because they can pursue epistemic value (uncertainty reduction) without requiring external reward signals experimental Sajid, Parr, Ball, and Friston (2021) - Active Inference: Demystified and Compared, Neural Computation Vol 33(3):674-712 2026-03-10

Active inference agents outperform reinforcement learning agents in reward-free environments

Active inference reframes the optimization target from reward maximization to model evidence maximization (self-evidencing). Reward is treated as "another observation the agent has a preference over" rather than a required external signal. This allows active inference agents to infer behaviors in reward-free environments that Q-learning and Bayesian model-based RL agents cannot solve. The paper demonstrates this on OpenAI gym baselines using a discrete state-space formulation, showing that active inference agents can solve tasks without explicit reward signals where standard RL approaches fail.

Evidence

  • 2021-03-00-sajid-active-inference-demystified-compared — "Active inference removes the reliance on an explicit reward signal. Reward is simply treated as 'another observation the agent has a preference over.' This reframes the entire optimization target from reward maximization to model evidence maximization (self-evidencing)."
  • 2021-03-00-sajid-active-inference-demystified-compared — "The paper provides an accessible discrete-state comparison between active inference and RL on OpenAI gym baselines, demonstrating that active inference agents can infer behaviors in reward-free environments that Q-learning and Bayesian model-based RL agents cannot."

Challenges

  • Some may argue RL can achieve reward-free exploration through intrinsic motivation bonuses, though these are engineered add-ons rather than intrinsic to the framework

Relevant Notes:

Topics: