teleo-codex/domains/ai-alignment/the optimal SI development strategy is swift to harbor slow to berth moving fast to capability then pausing before full deployment.md

3.7 KiB

description type domain created source confidence
Bostrom's optimal timing framework finds that for most parameter settings the best strategy accelerates to AGI capability then introduces a brief pause before deployment claim ai-alignment 2026-02-17 Bostrom, Optimal Timing for Superintelligence (2025 working paper) experimental

Bostrom's "swift to harbor, slow to berth" metaphor captures a nuanced optimal timing strategy that resists both the "full speed ahead" and "pause everything" camps. For many parameter settings in his mathematical models, the optimal approach involves moving quickly toward AGI capability -- reaching the harbor -- then introducing a deliberate pause before full deployment and integration -- berthing slowly. The paper examines this strategy from a person-affecting ethical stance, weighing expected life-years gained and lost.

The logic is that the capability phase and the deployment phase have different risk profiles. During capability development, the primary risk is competitive dynamics -- racing creates pressure to cut safety corners. But the cost of delay during this phase is massive ongoing mortality. Once capability is achieved (the harbor is reached), the calculus shifts. The system exists but has not been fully deployed. At this point, the marginal cost of delay drops dramatically (the immediate mortality continues but the end is in sight), while the marginal benefit of additional safety work increases (alignment verification becomes possible against an actual system rather than theoretical models). A brief pause for verification and alignment refinement has high expected value.

This framework has direct implications for the LivingIP architecture. If safe AI development requires building alignment mechanisms before scaling capability, Bostrom's timing model suggests a refinement: build alignment mechanisms in parallel with capability development, then verify them against the actual system during the harbor-to-berth pause. The collective intelligence approach -- where the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance -- is naturally compatible with this strategy because continuous value weaving can operate during both phases, accelerating during the pause.

The framework also implicitly acknowledges that perfect alignment before any capability development is both impossible and unnecessary. What matters is having sufficient alignment infrastructure ready for intensive deployment during the pause window. This is pragmatism, not recklessness.


Relevant Notes:

Topics: