diff --git a/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md b/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md index 093867de..7486f2d3 100644 --- a/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md +++ b/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md @@ -21,6 +21,12 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system. + +### Additional Evidence (extend) +*Source: [[2024-11-00-ruiz-serra-factorised-active-inference-multi-agent]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +Ruiz-Serra et al. (2024) provide formal evidence for the coordination framing through multi-agent active inference: even when individual agents successfully minimize their own expected free energy using factorised generative models with Theory of Mind beliefs about others, the ensemble-level expected free energy 'is not necessarily minimised at the aggregate level.' This demonstrates that alignment cannot be solved at the individual agent level—the interaction structure and coordination mechanisms determine whether individual optimization produces collective intelligence or collective failure. The finding validates that alignment is fundamentally about designing interaction structures that bridge individual and collective optimization, not about perfecting individual agent objectives. + --- Relevant Notes: diff --git a/domains/ai-alignment/factorised-generative-models-enable-decentralized-multi-agent-representation-through-individual-level-beliefs.md b/domains/ai-alignment/factorised-generative-models-enable-decentralized-multi-agent-representation-through-individual-level-beliefs.md new file mode 100644 index 00000000..0c0c42c3 --- /dev/null +++ b/domains/ai-alignment/factorised-generative-models-enable-decentralized-multi-agent-representation-through-individual-level-beliefs.md @@ -0,0 +1,42 @@ +--- +type: claim +domain: ai-alignment +secondary_domains: [collective-intelligence] +description: "Each agent maintains explicit beliefs about other agents' internal states enabling strategic planning without centralized coordination" +confidence: experimental +source: "Ruiz-Serra et al., 'Factorised Active Inference for Strategic Multi-Agent Interactions' (AAMAS 2025)" +created: 2026-03-11 +--- + +# Factorised generative models enable decentralized multi-agent representation through individual-level beliefs about other agents' internal states + +In multi-agent active inference systems, factorisation of the generative model allows each agent to maintain "explicit, individual-level beliefs about the internal states of other agents." This approach enables decentralized representation of the multi-agent system—no agent requires global knowledge or centralized coordination to engage in strategic planning. + +Each agent uses its beliefs about other agents' internal states for "strategic planning in a joint context," operationalizing Theory of Mind within the active inference framework. This is distinct from approaches that require shared world models or centralized orchestration. + +The factorised approach scales to complex strategic interactions: Ruiz-Serra et al. demonstrate the framework in iterated normal-form games with 2 and 3 players, showing how agents navigate both cooperative and non-cooperative strategic contexts using only their individual beliefs about others. + +## Evidence + +Ruiz-Serra et al. (2024) introduce factorised generative models for multi-agent active inference, where "each agent maintains explicit, individual-level beliefs about the internal states of other agents" through factorisation of the generative model. This enables "strategic planning in a joint context" without requiring centralized coordination or shared representations. + +The paper applies this framework to game-theoretic settings (iterated normal-form games with 2-3 players), demonstrating that agents can engage in strategic interaction using only their individual beliefs about others' internal states. + +## Architectural Implications + +This approach provides a formal foundation for decentralized multi-agent architectures: + +1. **No centralized world model required**: Each agent maintains its own beliefs about others, eliminating single points of failure and scaling bottlenecks. + +2. **Theory of Mind as computational mechanism**: Strategic planning emerges from individual beliefs about others' internal states, not from explicit communication protocols or shared representations. + +3. **Scalable strategic interaction**: The factorised approach extends to N-agent systems without requiring exponential growth in representational complexity. + +However, as demonstrated in [[individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference]], decentralized representation does not automatically produce collective optimization—explicit coordination mechanisms remain necessary. + +--- + +Relevant Notes: +- [[individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference]] +- [[subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers]] +- [[AI agent orchestration that routes data and tools between specialized models outperforms both single-model and human-coached approaches because the orchestrator contributes coordination not direction]] diff --git a/domains/ai-alignment/individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md b/domains/ai-alignment/individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md new file mode 100644 index 00000000..3d22954e --- /dev/null +++ b/domains/ai-alignment/individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md @@ -0,0 +1,39 @@ +--- +type: claim +domain: ai-alignment +secondary_domains: [collective-intelligence] +description: "Ensemble-level expected free energy characterizes basins of attraction that may not align with individual agent optima, revealing a fundamental tension between individual and collective optimization" +confidence: experimental +source: "Ruiz-Serra et al., 'Factorised Active Inference for Strategic Multi-Agent Interactions' (AAMAS 2025)" +created: 2026-03-11 +--- + +# Individual free energy minimization does not guarantee collective optimization in multi-agent active inference systems + +When multiple active inference agents interact strategically, each agent minimizes its own expected free energy (EFE) based on beliefs about other agents' internal states. However, the ensemble-level expected free energy—which characterizes basins of attraction in games with multiple Nash Equilibria—is not necessarily minimized at the aggregate level. + +This finding reveals a fundamental tension between individual and collective optimization in multi-agent active inference systems. Even when each agent successfully minimizes its individual free energy through strategic planning that incorporates Theory of Mind beliefs about others, the collective outcome may be suboptimal from a system-wide perspective. + +## Evidence + +Ruiz-Serra et al. (2024) applied factorised active inference to strategic multi-agent interactions in game-theoretic settings. Their key finding: "the ensemble-level expected free energy characterizes basins of attraction of games with multiple Nash Equilibria under different conditions" but "it is not necessarily minimised at the aggregate level." + +The paper demonstrates this through iterated normal-form games with 2 and 3 players, showing how the specific interaction structure (game type, communication channels) determines whether individual optimization produces collective intelligence or collective failure. The factorised generative model approach—where each agent maintains explicit individual-level beliefs about other agents' internal states—enables decentralized representation but does not automatically align individual and collective objectives. + +## Implications + +This result has direct architectural implications for multi-agent AI systems: + +1. **Explicit coordination mechanisms are necessary**: Simply giving each agent active inference dynamics and assuming collective optimization will emerge is insufficient. The gap between individual and collective optimization must be bridged through deliberate design. + +2. **Interaction structure matters**: The specific form of agent interaction—not just individual agent capability—determines whether collective intelligence emerges or whether individually optimal agents produce suboptimal collective outcomes. + +3. **Evaluator roles are formally justified**: In systems like the Teleo architecture, Leo's cross-domain synthesis role exists precisely because individual agent optimization doesn't guarantee collective optimization. The evaluator function bridges individual and collective free energy. + +--- + +Relevant Notes: +- [[AI alignment is a coordination problem not a technical problem]] +- [[collective intelligence requires diversity as a structural precondition not a moral preference]] +- [[safe AI development requires building alignment mechanisms before scaling capability]] +- [[AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system]] diff --git a/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md b/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md index 9e68f84d..4c5b85f9 100644 --- a/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md +++ b/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md @@ -21,6 +21,12 @@ This observation creates tension with [[multi-model collaboration solved problem For the collective superintelligence thesis, this is important. If subagent hierarchies consistently outperform peer architectures, then [[collective superintelligence is the alternative to monolithic AI controlled by a few]] needs to specify what "collective" means architecturally — not flat peer networks, but nested hierarchies with human principals at the top. + +### Additional Evidence (challenge) +*Source: [[2024-11-00-ruiz-serra-factorised-active-inference-multi-agent]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +Ruiz-Serra et al.'s factorised active inference framework demonstrates successful peer multi-agent coordination without hierarchical control. Each agent maintains individual-level beliefs about others' internal states and performs strategic planning in a joint context through decentralized representation. The framework successfully handles iterated normal-form games with 2-3 players without requiring a primary controller. However, the finding that ensemble-level expected free energy is not necessarily minimized at the aggregate level suggests that while peer architectures can function, they may require explicit coordination mechanisms (effectively reintroducing hierarchy) to achieve collective optimization. This partially challenges the claim while explaining why hierarchies emerge in practice. + --- Relevant Notes: diff --git a/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md b/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md index 6b3649c5..992db302 100644 --- a/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md +++ b/inbox/archive/2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md @@ -7,9 +7,15 @@ date: 2024-11-00 domain: ai-alignment secondary_domains: [collective-intelligence] format: paper -status: unprocessed +status: processed priority: medium tags: [active-inference, multi-agent, game-theory, strategic-interaction, factorised-generative-model, nash-equilibrium] +processed_by: theseus +processed_date: 2026-03-11 +claims_extracted: ["individual-free-energy-minimization-does-not-guarantee-collective-optimization-in-multi-agent-active-inference.md", "factorised-generative-models-enable-decentralized-multi-agent-representation-through-individual-level-beliefs.md"] +enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md", "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md"] +extraction_model: "anthropic/claude-sonnet-4.5" +extraction_notes: "Extracted two novel claims about multi-agent active inference: (1) individual free energy minimization doesn't guarantee collective optimization, and (2) factorised generative models enable decentralized strategic planning through individual beliefs about others. Applied three enrichments extending/challenging existing coordination and collective intelligence claims. The paper provides formal game-theoretic evidence for why explicit coordination mechanisms (like Leo's evaluator role) are necessary in multi-agent systems—individual optimization and collective optimization are not automatically aligned." --- ## Content