teleo-codex/domains/ai-alignment/recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving.md at leo/cleanup-test-claim

theseus: 3 enrichments + 2 claims from Dario Amodei / Anthropic sources

Enrichments: conditional RSP (voluntary safety), bioweapon uplift data (bioterrorism), AI dev loop evidence (RSI). Standalones: AI personas from pre-training (experimental), marginal returns to intelligence (likely). Source diversity flagged (3 Dario sources). Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>

2026-03-06 08:05:22 -07:00

6.1 KiB

Raw Permalink Blame History

description	type	domain	created	source	confidence
The intelligence explosion dynamic occurs when an AI crosses the threshold where it can improve itself faster than humans can, creating a self-reinforcing feedback loop	claim	ai-alignment	2026-02-16	Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)	likely

Bostrom formalizes the dynamics of an intelligence explosion using two variables: optimization power (quality-weighted design effort applied to increase the system's intelligence) and recalcitrance (the inverse of the system's responsiveness to that effort). The rate of change in intelligence equals optimization power divided by recalcitrance. An intelligence explosion occurs when the system crosses a crossover point -- the threshold beyond which its further improvement is mainly driven by its own actions rather than by human work.

At the crossover point, a powerful positive feedback loop engages: the AI improves itself, the improved version is better at self-improvement, which produces further improvements. The thing that does the improving is itself improving. This is qualitatively different from any human technology race because humans cannot increase their own cognitive capacity in real time to accelerate their research. The result is that recalcitrance at the critical juncture is likely to be low: the step from human-level to radically superhuman intelligence may be far easier than the step from sub-human to human-level, because the latter involves fundamental breakthroughs while the former involves parameter optimization by an already-capable system.

Bostrom identifies several factors that make low recalcitrance at the crossover point plausible. If human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without touching intermediate rungs. Hardware that is already abundant but underutilized could be immediately exploited. And unlike biological cognition, digital minds benefit from hardware advantages of seven or more orders of magnitude in computational speed, along with software advantages like duplicability, memory sharing, and editability.

This connects to the broader pattern of recursive improvement in human progress -- but with a critical difference. Human recursive improvement operates across generations and is mediated by cultural transmission. Machine recursive improvement operates in real time and is limited only by computational resources. The transition from one to the other could be abrupt.

Evidence the self-reinforcing loop has already started (2026). Dario Amodei reports that AI is "now writing much of the code at Anthropic" and is "already substantially accelerating the rate of progress in building the next generation of AI systems." He describes this as a "feedback loop gathering steam month by month" and estimates Anthropic "may be only 1-2 years away from the point where the current generation of AI autonomously builds the next." This is empirical evidence that the crossover point Bostrom theorized may be approaching: AI contributing meaningfully to its own improvement. The loop is not yet fully autonomous — humans still direct and review — but the direction of travel is toward increasing AI contribution to the optimization power variable. Amodei characterizes this as the most important fact about AI timelines: "I can feel the pace of progress, and the clock ticking down." (Source: Dario Amodei, "The Adolescence of Technology," darioamodei.com, 2026.)

Counterargument: "jagged intelligence" as alternative SI pathway. Noah Smith argues that superintelligence has already arrived through a different mechanism than recursive self-improvement — via the combination of human-level language comprehension and reasoning with superhuman speed, memory, tirelessness, and parallelizability. He calls this "jagged intelligence": superhuman in some dimensions, human-level in others, potentially below-human in intuition and judgment. The evidence: METR capability curves climbing across cognitive benchmarks with no plateau, ~100 Erdős conjecture problems solved, Terence Tao describing AI as a complementary research tool, Ginkgo Bioworks compressing 150 years of protein engineering into weeks with GPT-5. If SI arrives through combination rather than recursion, the alignment challenge shifts from "prevent a future threshold crossing" to "govern systems that already exceed human capability in aggregate." The $600B in hyperscaler capex planned for 2026 is infrastructure for deploying already-superhuman systems, not speculative investment in a future explosion. This doesn't invalidate the RSI thesis — recursive improvement may still occur — but it challenges its centrality to alignment strategy. (Source: Noah Smith, "Superintelligence is already here, today," Noahopinion, Mar 2, 2026.)

Relevant Notes:

the first mover to superintelligence likely gains decisive strategic advantage because the gap between leader and followers accelerates during takeoff -- recursive self-improvement is the engine that creates decisive strategic advantage: the gap widens because improvements compound
capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds -- recursive improvement is why containment is temporary: the system improves faster than its constraints can be updated
technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap -- the intelligence explosion would be a discontinuity in the already exponential trend
three paths to superintelligence exist but only collective superintelligence preserves human agency -- understanding takeoff dynamics is essential for choosing which path to pursue
Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development -- reframes recursive self-improvement as governed evolution: more credible because the throttle is the feature, more novel because propose-review-merge is unexplored middle ground

Topics:

_map

6.1 KiB Raw Permalink Blame History

6.1 KiB

Raw Permalink Blame History