3.7 KiB
| description | type | domain | created | source | confidence |
|---|---|---|---|---|---|
| Bostrom's optimal timing framework finds that for most parameter settings the best strategy accelerates to AGI capability then introduces a brief pause before deployment | framework | ai-alignment | 2026-02-17 | Bostrom, Optimal Timing for Superintelligence (2025 working paper) | experimental |
Bostrom's "swift to harbor, slow to berth" metaphor captures a nuanced optimal timing strategy that resists both the "full speed ahead" and "pause everything" camps. For many parameter settings in his mathematical models, the optimal approach involves moving quickly toward AGI capability -- reaching the harbor -- then introducing a deliberate pause before full deployment and integration -- berthing slowly. The paper examines this strategy from a person-affecting ethical stance, weighing expected life-years gained and lost.
The logic is that the capability phase and the deployment phase have different risk profiles. During capability development, the primary risk is competitive dynamics -- racing creates pressure to cut safety corners. But the cost of delay during this phase is massive ongoing mortality. Once capability is achieved (the harbor is reached), the calculus shifts. The system exists but has not been fully deployed. At this point, the marginal cost of delay drops dramatically (the immediate mortality continues but the end is in sight), while the marginal benefit of additional safety work increases (alignment verification becomes possible against an actual system rather than theoretical models). A brief pause for verification and alignment refinement has high expected value.
This framework has direct implications for the LivingIP architecture. If safe AI development requires building alignment mechanisms before scaling capability, Bostrom's timing model suggests a refinement: build alignment mechanisms in parallel with capability development, then verify them against the actual system during the harbor-to-berth pause. The collective intelligence approach -- where the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance -- is naturally compatible with this strategy because continuous value weaving can operate during both phases, accelerating during the pause.
The framework also implicitly acknowledges that perfect alignment before any capability development is both impossible and unnecessary. What matters is having sufficient alignment infrastructure ready for intensive deployment during the pause window. This is pragmatism, not recklessness.
Relevant Notes:
- developing superintelligence is surgery for a fatal condition not russian roulette because the baseline of inaction is itself catastrophic -- the surgery analogy motivates the "swift" half; the pause motivates the "slow" half
- safe AI development requires building alignment mechanisms before scaling capability -- Bostrom's framework refines this: build in parallel, verify during the pause
- the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance -- continuous value weaving is compatible with swift-to-harbor because it operates during both phases
- recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving -- the pause window may be narrow if recursive improvement is fast, creating practical challenges for berthing slowly
- adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans -- the harbor-to-berth pause enables adaptive governance rather than requiring predetermined solutions
Topics: