teleo-codex/domains/ai-alignment/instrumental convergence risks may be less imminent than originally argued because current AI architectures do not exhibit systematic power-seeking behavior.md

3.6 KiB

description type domain created source confidence
A 2026 critique argues Bostrom's instrumental convergence thesis describes risks less imminent than portrayed, suggesting current and near-future AI architectures may not converge on power-seeking subgoals claim ai-alignment 2026-02-17 Brundage et al, AI and Ethics (2026); Bostrom, Superintelligence: Paths, Dangers, Strategies (2014) experimental

A 2026 paper in AI and Ethics argues that Bostrom's Instrumental Convergence Thesis -- the claim that superintelligent agents converge on self-preservation, resource acquisition, and goal integrity regardless of their final objectives -- describes risks that are "less imminent than often portrayed." The core argument is that the convergence thesis was developed for theoretical agents with clearly specified utility functions operating in open-ended environments, and current AI architectures do not fit this template closely enough for the thesis to apply directly.

Current large language models do not have explicit utility functions, do not maintain persistent goals across interactions, and do not operate in open-ended physical environments where resource acquisition would be meaningful. They are trained on human data, deployed in constrained contexts, and lack the agentic architecture that would make self-preservation instrumentally valuable. The gap between these systems and the theoretical agents in Bostrom's argument is large enough that treating convergence as an imminent practical risk may be misguided.

This does not invalidate the convergence thesis as a theoretical concern. If and when AI systems develop persistent goals, environmental awareness, and the capacity for long-horizon planning, the instrumental convergence dynamics Bostrom identified could engage. The critique is about timing and architecture, not about logic. The risk is real but may apply to a future architecture quite different from today's systems. This has practical implications: safety resources directed at preventing instrumental convergence in current LLMs may be misallocated compared to addressing actual near-term risks like misuse, bias, and unintended optimization.

For LivingIP, this is relevant because the collective intelligence architecture may naturally resist instrumental convergence. If intelligence is distributed across many agents with different goals and limited individual autonomy, the conditions for convergence -- unified agency with persistent goals in open-ended environments -- simply do not obtain. The architecture itself may be a structural defense against the convergence dynamics Bostrom originally warned about.


Relevant Notes: