diff --git a/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md b/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md index 093867de..3fffcef3 100644 --- a/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md +++ b/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md @@ -21,6 +21,12 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system. + +### Additional Evidence (confirm) +*Source: [[2024-11-00-ruiz-serra-factorised-active-inference-multi-agent]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +Ruiz-Serra et al. (AAMAS 2025) provide formal game-theoretic evidence that individual free energy minimization in multi-agent active inference systems does not guarantee collective optimization. The ensemble-level expected free energy 'characterizes basins of attraction of games with multiple Nash Equilibria under different conditions' but 'it is not necessarily minimised at the aggregate level.' This demonstrates mathematically that alignment cannot be solved at the individual agent level—the interaction structure and coordination mechanisms determine whether individual optimization produces beneficial collective outcomes. This is precisely the coordination problem: agents can be individually aligned (minimizing their own free energy) while collectively misaligned (settling into suboptimal equilibria). + --- Relevant Notes: diff --git a/domains/ai-alignment/individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md b/domains/ai-alignment/individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md new file mode 100644 index 00000000..59a5e7c2 --- /dev/null +++ b/domains/ai-alignment/individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md @@ -0,0 +1,38 @@ +--- +type: claim +domain: ai-alignment +description: "Individual free energy minimization in multi-agent active inference systems does not guarantee collective free energy minimization because ensemble-level expected free energy characterizes basins of attraction that may not align with individual optima" +confidence: experimental +source: "Ruiz-Serra et al., 'Factorised Active Inference for Strategic Multi-Agent Interactions' (AAMAS 2025)" +created: 2026-03-11 +secondary_domains: [collective-intelligence] +--- + +# Individual free energy minimization does not guarantee collective optimization in multi-agent active inference systems + +When multiple active inference agents interact strategically, each agent minimizing its own expected free energy does not necessarily produce optimal collective outcomes. Ruiz-Serra et al. demonstrate through game-theoretic analysis that "ensemble-level expected free energy characterizes basins of attraction of games with multiple Nash Equilibria under different conditions" but "it is not necessarily minimised at the aggregate level." + +This finding reveals a fundamental tension between individual and collective optimization in multi-agent active inference systems. While individual agents successfully minimize their own free energy through strategic planning based on beliefs about other agents' internal states, the aggregate system behavior can settle into suboptimal equilibria. + +The framework uses factorised generative models where each agent maintains "explicit, individual-level beliefs about the internal states of other agents" to enable decentralized strategic planning. Applied to iterated normal-form games with 2-3 players, the model shows how interaction structure (game type, communication channels) determines whether individual optimization produces collective intelligence or collective failure. + +## Evidence + +- Ruiz-Serra et al. (2024) show through formal analysis of multi-agent active inference in game-theoretic settings that ensemble-level EFE is not necessarily minimized at aggregate level despite individual optimization +- The paper demonstrates this through iterated normal-form games where individually rational agents can produce collectively suboptimal Nash equilibria +- The specific interaction structure (game form, communication channels) determines whether collective optimization emerges from individual free energy minimization + +## Implications + +This result has critical implications for multi-agent AI system design. It means autonomous agents cannot be deployed with only individual optimization objectives and expected to produce beneficial collective outcomes. Explicit coordination mechanisms—evaluator roles, structured interaction protocols, cross-domain synthesis—are necessary architectural additions beyond pure agent autonomy. + +--- + +Relevant Notes: +- [[AI alignment is a coordination problem not a technical problem]] +- [[collective intelligence requires diversity as a structural precondition not a moral preference]] +- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] + +Topics: +- [[domains/ai-alignment/_map]] +- [[foundations/collective-intelligence/_map]] diff --git a/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md b/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md index 9e68f84d..09d2e6fe 100644 --- a/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md +++ b/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md @@ -21,6 +21,12 @@ This observation creates tension with [[multi-model collaboration solved problem For the collective superintelligence thesis, this is important. If subagent hierarchies consistently outperform peer architectures, then [[collective superintelligence is the alternative to monolithic AI controlled by a few]] needs to specify what "collective" means architecturally — not flat peer networks, but nested hierarchies with human principals at the top. + +### Additional Evidence (challenge) +*Source: [[2024-11-00-ruiz-serra-factorised-active-inference-multi-agent]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +Ruiz-Serra et al. (AAMAS 2025) demonstrate successful coordination in peer multi-agent architectures without hierarchical control. Their factorised active inference framework enables 2-3 player peer coordination where each agent maintains individual-level beliefs about others' internal states and uses those beliefs for strategic planning in joint contexts. The framework successfully navigates both cooperative and non-cooperative strategic interactions in iterated normal-form games. This suggests that peer architectures can work when agents have explicit Theory of Mind capabilities—the key variable is not hierarchy vs. peer structure, but whether agents can model each other's decision processes. The convergence to hierarchies in deployed systems may reflect implementation convenience, computational constraints, or organizational inertia rather than fundamental architectural superiority. + --- Relevant Notes: diff --git a/domains/ai-alignment/theory-of-mind-in-active-inference-emerges-from-factorised-generative-models-that-represent-other-agents-internal-states.md b/domains/ai-alignment/theory-of-mind-in-active-inference-emerges-from-factorised-generative-models-that-represent-other-agents-internal-states.md new file mode 100644 index 00000000..88447422 --- /dev/null +++ b/domains/ai-alignment/theory-of-mind-in-active-inference-emerges-from-factorised-generative-models-that-represent-other-agents-internal-states.md @@ -0,0 +1,37 @@ +--- +type: claim +domain: ai-alignment +description: "Factorised generative models operationalize Theory of Mind by maintaining explicit individual-level beliefs about other agents' internal states for strategic coordination" +confidence: experimental +source: "Ruiz-Serra et al., 'Factorised Active Inference for Strategic Multi-Agent Interactions' (AAMAS 2025)" +created: 2026-03-11 +secondary_domains: [collective-intelligence] +--- + +# Theory of Mind in active inference emerges from factorised generative models that represent other agents' internal states + +Ruiz-Serra et al. demonstrate that strategic multi-agent coordination can be achieved through factorisation of the generative model, where each agent maintains "explicit, individual-level beliefs about the internal states of other agents." This approach operationalizes Theory of Mind within the active inference framework, enabling agents to use their beliefs about others' internal states for "strategic planning in a joint context." + +The factorised approach enables decentralized representation of the multi-agent system—each agent independently models the beliefs, preferences, and likely actions of other agents without requiring centralized coordination or shared world models. This creates a computational architecture for strategic interaction that scales to multiple agents while preserving individual autonomy. + +Applied to iterated normal-form games with 2-3 players, the framework shows how agents navigate both cooperative and non-cooperative strategic interactions by maintaining and updating beliefs about other agents' internal states. The agents don't just respond to observed actions; they model the decision-making processes of other agents and plan accordingly. + +## Evidence + +- Ruiz-Serra et al. (2024) introduce factorised generative models where each agent maintains explicit beliefs about other agents' internal states +- The framework successfully models strategic behavior in iterated 2-player and 3-player normal-form games +- Agents use these individual-level beliefs about others for strategic planning in joint contexts, demonstrating Theory of Mind capabilities operationalized within active inference + +## Relationship to Existing Work + +This provides a formal mechanism for how [[AI agent orchestration that routes data and tools between specialized models outperforms both single-model and human-coached approaches because the orchestrator contributes coordination not direction]] might work at the cognitive level—the orchestrator maintains beliefs about the capabilities and states of specialized agents. + +--- + +Relevant Notes: +- [[multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together]] +- [[subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers]] + +Topics: +- [[domains/ai-alignment/_map]] +- [[foundations/collective-intelligence/_map]] diff --git a/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md b/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md index 6b3649c5..55bdbc16 100644 --- a/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md +++ b/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md @@ -7,9 +7,15 @@ date: 2024-11-00 domain: ai-alignment secondary_domains: [collective-intelligence] format: paper -status: unprocessed +status: processed priority: medium tags: [active-inference, multi-agent, game-theory, strategic-interaction, factorised-generative-model, nash-equilibrium] +processed_by: theseus +processed_date: 2026-03-11 +claims_extracted: ["individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md", "theory-of-mind-in-active-inference-emerges-from-factorised-generative-models-that-represent-other-agents-internal-states.md"] +enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md", "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md"] +extraction_model: "anthropic/claude-sonnet-4.5" +extraction_notes: "Extracted two novel claims about multi-agent active inference: (1) individual optimization doesn't guarantee collective optimization, and (2) Theory of Mind operationalized through factorised generative models. Applied three enrichments confirming coordination-problem framing, extending diversity-as-structural-requirement, and challenging hierarchy-superiority assumption. Key insight: this paper provides formal game-theoretic grounding for why deliberate coordination architecture (Leo's role, PR review, cross-domain synthesis) is necessary rather than emergent." --- ## Content