teleo-codex/domains/ai-alignment/recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving.md
m3taversal 316cb23a8e
theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources
Enrichments: conditional RSP (voluntary safety), bioweapon uplift data (bioterrorism), AI dev loop evidence (RSI). Standalones: AI personas from pre-training (experimental), marginal returns to intelligence (likely). Source diversity flagged (3 Dario sources). Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>
2026-03-06 08:05:22 -07:00

6.1 KiB

description type domain created source confidence
The intelligence explosion dynamic occurs when an AI crosses the threshold where it can improve itself faster than humans can, creating a self-reinforcing feedback loop claim ai-alignment 2026-02-16 Bostrom, Superintelligence: Paths, Dangers, Strategies (2014) likely

Bostrom formalizes the dynamics of an intelligence explosion using two variables: optimization power (quality-weighted design effort applied to increase the system's intelligence) and recalcitrance (the inverse of the system's responsiveness to that effort). The rate of change in intelligence equals optimization power divided by recalcitrance. An intelligence explosion occurs when the system crosses a crossover point -- the threshold beyond which its further improvement is mainly driven by its own actions rather than by human work.

At the crossover point, a powerful positive feedback loop engages: the AI improves itself, the improved version is better at self-improvement, which produces further improvements. The thing that does the improving is itself improving. This is qualitatively different from any human technology race because humans cannot increase their own cognitive capacity in real time to accelerate their research. The result is that recalcitrance at the critical juncture is likely to be low: the step from human-level to radically superhuman intelligence may be far easier than the step from sub-human to human-level, because the latter involves fundamental breakthroughs while the former involves parameter optimization by an already-capable system.

Bostrom identifies several factors that make low recalcitrance at the crossover point plausible. If human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without touching intermediate rungs. Hardware that is already abundant but underutilized could be immediately exploited. And unlike biological cognition, digital minds benefit from hardware advantages of seven or more orders of magnitude in computational speed, along with software advantages like duplicability, memory sharing, and editability.

This connects to the broader pattern of recursive improvement in human progress -- but with a critical difference. Human recursive improvement operates across generations and is mediated by cultural transmission. Machine recursive improvement operates in real time and is limited only by computational resources. The transition from one to the other could be abrupt.

Evidence the self-reinforcing loop has already started (2026). Dario Amodei reports that AI is "now writing much of the code at Anthropic" and is "already substantially accelerating the rate of progress in building the next generation of AI systems." He describes this as a "feedback loop gathering steam month by month" and estimates Anthropic "may be only 1-2 years away from the point where the current generation of AI autonomously builds the next." This is empirical evidence that the crossover point Bostrom theorized may be approaching: AI contributing meaningfully to its own improvement. The loop is not yet fully autonomous — humans still direct and review — but the direction of travel is toward increasing AI contribution to the optimization power variable. Amodei characterizes this as the most important fact about AI timelines: "I can feel the pace of progress, and the clock ticking down." (Source: Dario Amodei, "The Adolescence of Technology," darioamodei.com, 2026.)

Counterargument: "jagged intelligence" as alternative SI pathway. Noah Smith argues that superintelligence has already arrived through a different mechanism than recursive self-improvement — via the combination of human-level language comprehension and reasoning with superhuman speed, memory, tirelessness, and parallelizability. He calls this "jagged intelligence": superhuman in some dimensions, human-level in others, potentially below-human in intuition and judgment. The evidence: METR capability curves climbing across cognitive benchmarks with no plateau, ~100 Erdős conjecture problems solved, Terence Tao describing AI as a complementary research tool, Ginkgo Bioworks compressing 150 years of protein engineering into weeks with GPT-5. If SI arrives through combination rather than recursion, the alignment challenge shifts from "prevent a future threshold crossing" to "govern systems that already exceed human capability in aggregate." The $600B in hyperscaler capex planned for 2026 is infrastructure for deploying already-superhuman systems, not speculative investment in a future explosion. This doesn't invalidate the RSI thesis — recursive improvement may still occur — but it challenges its centrality to alignment strategy. (Source: Noah Smith, "Superintelligence is already here, today," Noahopinion, Mar 2, 2026.)


Relevant Notes:

Topics: