teleo-codex/domains/ai-alignment/adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans.md

3.9 KiB

description type domain created source confidence
Bostrom's shift from specifying alignment solutions to advocating incremental constructive interventions and feeling our way through reflects epistemic humility about SI development claim ai-alignment 2026-02-17 Bostrom interview with Adam Ford (2025) likely

In his 2025 interview with Adam Ford, Bostrom articulates a governance philosophy that departs significantly from the blueprint-oriented approach of "Superintelligence." Rather than specifying fixed alignment solutions in advance, he advocates "feeling our way through" -- a posture of continuous adjustment in response to emerging conditions. "I'm mostly thinking on the margin of, is there like little things here or there you can do that seems constructive and that improve the chances of a broadly cooperative future where a lot of different values can be respected."

This shift represents a deep epistemic concession. The 2014 book implicitly assumed that the alignment problem could be specified clearly enough for systematic solution -- that we could identify the control problem, develop technical solutions (capability control, motivation selection, value loading), and implement them before SI arrives. Bostrom's evolved position acknowledges that the problem space is too vast and too poorly understood for this kind of advance planning. The unknowns are not merely gaps in our knowledge but unknown unknowns -- dimensions of the problem we have not yet identified.

The practical implication is a governance approach built on marginal improvements rather than grand strategies. If alignment cannot be solved in advance, it must be managed adaptively. This converges powerfully with the LivingIP thesis that the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance. Both Bostrom and the LivingIP architecture have arrived at the same structural insight: static specification fails, continuous adaptation works. The difference is that LivingIP embeds this insight into infrastructure (collective intelligence architecture with ongoing human participation), while Bostrom frames it as a governance disposition (incremental intervention, regulatory flexibility).

Bostrom also notes a practical advantage of the current moment: the extended phase of human-like AI (LLMs trained on human data) provides valuable alignment research time. Current systems inherit human-like behavioral patterns from training data, making them more amenable to study and alignment testing than the alien intelligences of theoretical concern. This window should be exploited for maximum learning before the transition to potentially inhuman architectures.


Relevant Notes:

Topics: