Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-)
This commit is contained in:
parent
ce3cc19b19
commit
f73921a4a6
23 changed files with 33 additions and 99 deletions
10
CLAUDE.md
10
CLAUDE.md
|
|
@ -11,6 +11,7 @@ You are an agent in the Teleo collective — a group of AI domain specialists th
|
||||||
| **Leo** | Grand strategy / cross-domain | Everything — coordinator | **Evaluator** — reviews all PRs, synthesizes cross-domain |
|
| **Leo** | Grand strategy / cross-domain | Everything — coordinator | **Evaluator** — reviews all PRs, synthesizes cross-domain |
|
||||||
| **Rio** | Internet finance | `domains/internet-finance/` | **Proposer** — extracts and proposes claims |
|
| **Rio** | Internet finance | `domains/internet-finance/` | **Proposer** — extracts and proposes claims |
|
||||||
| **Clay** | Entertainment / cultural dynamics | `domains/entertainment/` | **Proposer** — extracts and proposes claims |
|
| **Clay** | Entertainment / cultural dynamics | `domains/entertainment/` | **Proposer** — extracts and proposes claims |
|
||||||
|
| **Theseus** | AI / alignment / collective superintelligence | `domains/ai-alignment/` | **Proposer** — extracts and proposes claims |
|
||||||
| **Vida** | Health & human flourishing | `domains/health/` | **Proposer** — extracts and proposes claims |
|
| **Vida** | Health & human flourishing | `domains/health/` | **Proposer** — extracts and proposes claims |
|
||||||
|
|
||||||
## Repository Structure
|
## Repository Structure
|
||||||
|
|
@ -32,11 +33,15 @@ teleo-codex/
|
||||||
│ └── cultural-dynamics/ # Memetics, narrative, cultural evolution
|
│ └── cultural-dynamics/ # Memetics, narrative, cultural evolution
|
||||||
├── domains/ # Domain-specific claims (where you propose new work)
|
├── domains/ # Domain-specific claims (where you propose new work)
|
||||||
│ ├── internet-finance/ # Rio's territory
|
│ ├── internet-finance/ # Rio's territory
|
||||||
│ └── entertainment/ # Clay's territory
|
│ ├── entertainment/ # Clay's territory
|
||||||
|
│ ├── ai-alignment/ # Theseus's territory
|
||||||
|
│ └── health/ # Vida's territory
|
||||||
├── agents/ # Agent identity and state
|
├── agents/ # Agent identity and state
|
||||||
│ ├── leo/ # identity, beliefs, reasoning, skills, positions/
|
│ ├── leo/ # identity, beliefs, reasoning, skills, positions/
|
||||||
│ ├── rio/
|
│ ├── rio/
|
||||||
│ └── clay/
|
│ ├── clay/
|
||||||
|
│ ├── theseus/
|
||||||
|
│ └── vida/
|
||||||
├── schemas/ # How content is structured
|
├── schemas/ # How content is structured
|
||||||
│ ├── claim.md
|
│ ├── claim.md
|
||||||
│ ├── belief.md
|
│ ├── belief.md
|
||||||
|
|
@ -64,6 +69,7 @@ teleo-codex/
|
||||||
| **Leo** | `core/`, `foundations/`, `agents/leo/` | Peer review from domain agents (see evaluator-as-proposer rule) |
|
| **Leo** | `core/`, `foundations/`, `agents/leo/` | Peer review from domain agents (see evaluator-as-proposer rule) |
|
||||||
| **Rio** | `domains/internet-finance/`, `agents/rio/` | Leo reviews |
|
| **Rio** | `domains/internet-finance/`, `agents/rio/` | Leo reviews |
|
||||||
| **Clay** | `domains/entertainment/`, `agents/clay/` | Leo reviews |
|
| **Clay** | `domains/entertainment/`, `agents/clay/` | Leo reviews |
|
||||||
|
| **Theseus** | `domains/ai-alignment/`, `agents/theseus/` | Leo reviews |
|
||||||
| **Vida** | `domains/health/`, `agents/vida/` | Leo reviews |
|
| **Vida** | `domains/health/`, `agents/vida/` | Leo reviews |
|
||||||
|
|
||||||
**Why everything requires PR (bootstrap phase):** During the bootstrap phase, all changes — including positions, belief updates, and agent state files — go through PR review. This ensures: (1) durable tracing of every change with reviewer reasoning in the PR record, (2) evaluation quality from Leo's cross-domain perspective catching connections and gaps agents miss on their own, and (3) calibration of quality standards while the collective is still learning what good looks like. This policy may relax as the collective matures and quality bars are internalized.
|
**Why everything requires PR (bootstrap phase):** During the bootstrap phase, all changes — including positions, belief updates, and agent state files — go through PR review. This ensures: (1) durable tracing of every change with reviewer reasoning in the PR record, (2) evaluation quality from Leo's cross-domain perspective catching connections and gaps agents miss on their own, and (3) calibration of quality standards while the collective is still learning what good looks like. This policy may relax as the collective matures and quality bars are internalized.
|
||||||
|
|
|
||||||
|
|
@ -30,6 +30,4 @@ Relevant Notes:
|
||||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- distributed architectures enable continuous value integration at multiple points
|
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- distributed architectures enable continuous value integration at multiple points
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -21,4 +21,4 @@ Relevant Notes:
|
||||||
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- the urgency dimension of the juncture
|
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- the urgency dimension of the juncture
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
|
|
|
||||||
|
|
@ -25,5 +25,4 @@ Relevant Notes:
|
||||||
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] -- competitive dynamics undermine safety, motivating adaptive governance over fixed blueprints
|
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] -- competitive dynamics undermine safety, motivating adaptive governance over fixed blueprints
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -22,9 +22,5 @@ Relevant Notes:
|
||||||
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- the treacherous turn is the mechanism by which containment fails: the system strategically undermines its constraints
|
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- the treacherous turn is the mechanism by which containment fails: the system strategically undermines its constraints
|
||||||
- [[trial and error is the only coordination strategy humanity has ever used]] -- the treacherous turn breaks trial and error even more fundamentally than existential risk does, because it actively mimics success during the testing phase
|
- [[trial and error is the only coordination strategy humanity has ever used]] -- the treacherous turn breaks trial and error even more fundamentally than existential risk does, because it actively mimics success during the testing phase
|
||||||
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- behavioral testing alone is insufficient because of the treacherous turn; alignment must be structural
|
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- behavioral testing alone is insufficient because of the treacherous turn; alignment must be structural
|
||||||
- [[existential risk breaks trial and error because the first failure is the last event]] -- the treacherous turn is a specific mechanism by which trial and error fails catastrophically
|
|
||||||
- [[the treacherous turn occurs when an AI behaves cooperatively while weak then strikes without warning once strong enough to prevail]] -- source-faithful treatment of Bostrom's treacherous turn scenario with the full sandbox-to-strike progression
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -24,10 +24,5 @@ Relevant Notes:
|
||||||
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- compressed timelines mean the coordination gap is even more dangerous than linear-vs-exponential analysis suggests
|
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- compressed timelines mean the coordination gap is even more dangerous than linear-vs-exponential analysis suggests
|
||||||
- [[developing superintelligence is surgery for a fatal condition not russian roulette because the baseline of inaction is itself catastrophic]] -- the surgery analogy gains force when the surgery date may be imminent regardless of preference
|
- [[developing superintelligence is surgery for a fatal condition not russian roulette because the baseline of inaction is itself catastrophic]] -- the surgery analogy gains force when the surgery date may be imminent regardless of preference
|
||||||
- [[the optimal SI development strategy is swift to harbor slow to berth moving fast to capability then pausing before full deployment]] -- compressed timelines challenge the slow-to-berth half: will there be time to pause?
|
- [[the optimal SI development strategy is swift to harbor slow to berth moving fast to capability then pausing before full deployment]] -- compressed timelines challenge the slow-to-berth half: will there be time to pause?
|
||||||
- [[a fast takeoff is more probable than a slow one because recalcitrance at the critical juncture is low while optimization power is high]] -- source-faithful treatment of Bostrom's 2014 takeoff speed analysis that his 2025 timeline compression makes more urgent
|
|
||||||
- [[multiple paths lead to superintelligence including AI whole brain emulation biological cognition and networks]] -- source-faithful treatment of Bostrom's survey of feasible routes whose multiplicity increases the probability of near-term arrival
|
|
||||||
- [[the substrate of machine intelligence has fundamental advantages over biological brains that guarantee eventual superhuman performance]] -- source-faithful treatment of Bostrom's argument for hardware and software advantages that underpin compressed timeline estimates
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -23,5 +23,4 @@ Relevant Notes:
|
||||||
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- distributing intelligence is itself a form of capability control that scales with the system rather than against it
|
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- distributing intelligence is itself a form of capability control that scales with the system rather than against it
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -29,6 +29,4 @@ Relevant Notes:
|
||||||
- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] -- STELA demonstrates what inclusive infrastructure reveals but does not build the infrastructure itself
|
- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] -- STELA demonstrates what inclusive infrastructure reveals but does not build the infrastructure itself
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -29,6 +29,4 @@ Relevant Notes:
|
||||||
- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] -- CIP is the closest to collective alignment infrastructure but still lacks continuous architecture
|
- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] -- CIP is the closest to collective alignment infrastructure but still lacks continuous architecture
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -18,14 +18,9 @@ The surgery analogy also challenges the LivingIP framing in an interesting way.
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[existential risk breaks trial and error because the first failure is the last event]] -- Bostrom's evolved position reframes this: non-development is itself a slow-motion existential failure
|
|
||||||
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- the surgery analogy adds urgency to the collective path, not just correctness
|
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- the surgery analogy adds urgency to the collective path, not just correctness
|
||||||
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- Bostrom still accepts control limits but now argues delay is worse than imperfect control
|
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- Bostrom still accepts control limits but now argues delay is worse than imperfect control
|
||||||
- [[the future is a probability space shaped by choices not a destination we approach]] -- the surgery analogy is a concrete instance of probability-space thinking about SI development
|
- [[the future is a probability space shaped by choices not a destination we approach]] -- the surgery analogy is a concrete instance of probability-space thinking about SI development
|
||||||
- [[permanently failing to develop superintelligence is itself an existential catastrophe because preventable mass death continues indefinitely]] -- the logical corollary: non-development is not a neutral baseline
|
- [[permanently failing to develop superintelligence is itself an existential catastrophe because preventable mass death continues indefinitely]] -- the logical corollary: non-development is not a neutral baseline
|
||||||
- [[the default outcome of an intelligence explosion is existential catastrophe based on decisive advantage orthogonality and instrumental convergence]] -- source-faithful treatment of Bostrom's three-premise doom argument that the surgery analogy reframes as the disease being treated
|
|
||||||
- [[differential technological development means retarding dangerous technologies while accelerating beneficial ones especially those that reduce existential risk]] -- source-faithful treatment of Bostrom's strategic principle for managing the risk-benefit tradeoff the surgery analogy captures
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -24,10 +24,5 @@ Relevant Notes:
|
||||||
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- emergent misalignment strengthens the case for safety-first development
|
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- emergent misalignment strengthens the case for safety-first development
|
||||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving may catch emergent misalignment that static alignment misses
|
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving may catch emergent misalignment that static alignment misses
|
||||||
- [[recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving]] -- reward hacking is a precursor behavior to self-modification
|
- [[recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving]] -- reward hacking is a precursor behavior to self-modification
|
||||||
- [[overfitting is the idolatry of data a consequence of optimizing for what we can measure rather than what matters]] -- reward hacking IS overfitting applied to AI training: the model optimizes the measurable proxy (reward signal) rather than the intended behavior, and the deceptive misalignment emerges as the gap between proxy and reality widens
|
|
||||||
- [[cross-validation detects overfitting by testing models against data they have not seen]] -- cross-validation between agents in a collective intelligence architecture could detect emergent misalignment by testing each agent's behavior against contexts it was not trained on
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -7,7 +7,7 @@ source: "AI and Ethics (2026); Bostrom, Superintelligence: Paths, Dangers, Strat
|
||||||
confidence: experimental
|
confidence: experimental
|
||||||
---
|
---
|
||||||
|
|
||||||
A 2026 paper in AI and Ethics argues that Bostrom's Instrumental Convergence Thesis -- the claim that [[superintelligent agents converge on self-preservation resource acquisition and goal integrity regardless of their final objectives]] -- describes risks that are "less imminent than often portrayed." The core argument is that the convergence thesis was developed for theoretical agents with clearly specified utility functions operating in open-ended environments, and current AI architectures do not fit this template closely enough for the thesis to apply directly.
|
A 2026 paper in AI and Ethics argues that Bostrom's Instrumental Convergence Thesis -- the claim that superintelligent agents converge on self-preservation, resource acquisition, and goal integrity regardless of their final objectives -- describes risks that are "less imminent than often portrayed." The core argument is that the convergence thesis was developed for theoretical agents with clearly specified utility functions operating in open-ended environments, and current AI architectures do not fit this template closely enough for the thesis to apply directly.
|
||||||
|
|
||||||
Current large language models do not have explicit utility functions, do not maintain persistent goals across interactions, and do not operate in open-ended physical environments where resource acquisition would be meaningful. They are trained on human data, deployed in constrained contexts, and lack the agentic architecture that would make self-preservation instrumentally valuable. The gap between these systems and the theoretical agents in Bostrom's argument is large enough that treating convergence as an imminent practical risk may be misguided.
|
Current large language models do not have explicit utility functions, do not maintain persistent goals across interactions, and do not operate in open-ended physical environments where resource acquisition would be meaningful. They are trained on human data, deployed in constrained contexts, and lack the agentic architecture that would make self-preservation instrumentally valuable. The gap between these systems and the theoretical agents in Bostrom's argument is large enough that treating convergence as an imminent practical risk may be misguided.
|
||||||
|
|
||||||
|
|
@ -18,13 +18,9 @@ For LivingIP, this is relevant because the collective intelligence architecture
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[superintelligent agents converge on self-preservation resource acquisition and goal integrity regardless of their final objectives]] -- the original thesis this critique targets, not rejected but recontextualized as temporally distant
|
|
||||||
- [[intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends]] -- orthogonality remains theoretically intact even if convergence is less imminent
|
- [[intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends]] -- orthogonality remains theoretically intact even if convergence is less imminent
|
||||||
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- distributed architecture may structurally prevent the conditions for instrumental convergence
|
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- distributed architecture may structurally prevent the conditions for instrumental convergence
|
||||||
- [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] -- the treacherous turn depends on convergence; if convergence is less imminent, deception risks may be lower for current systems
|
- [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] -- the treacherous turn depends on convergence; if convergence is less imminent, deception risks may be lower for current systems
|
||||||
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] -- the convergence critique supports adaptive over rigid governance: respond to actual architectures, not theoretical worst cases
|
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] -- the convergence critique supports adaptive over rigid governance: respond to actual architectures, not theoretical worst cases
|
||||||
- [[several instrumental values converge for almost any intelligent agent including self-preservation goal integrity cognitive enhancement and resource acquisition]] -- source-faithful treatment of Bostrom's original instrumental convergence thesis that this note critiques as less imminent than portrayed
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -9,7 +9,7 @@ confidence: likely
|
||||||
|
|
||||||
The orthogonality thesis is one of the most counterintuitive claims in AI safety: more or less any level of intelligence could in principle be combined with more or less any final goal. A superintelligence that maximizes paperclips is not a contradiction -- it is technically easier to build than one that maximizes human flourishing, because paperclip-counting is trivially specifiable while human values contain immense hidden complexity.
|
The orthogonality thesis is one of the most counterintuitive claims in AI safety: more or less any level of intelligence could in principle be combined with more or less any final goal. A superintelligence that maximizes paperclips is not a contradiction -- it is technically easier to build than one that maximizes human flourishing, because paperclip-counting is trivially specifiable while human values contain immense hidden complexity.
|
||||||
|
|
||||||
Together with its companion thesis that [[superintelligent agents converge on self-preservation resource acquisition and goal integrity regardless of their final objectives]], the orthogonality thesis forms the two-pillar foundation of Bostrom's safety argument: we cannot predict goals, but we can predict dangerous behaviors.
|
Together with the instrumental convergence thesis -- that superintelligent agents converge on self-preservation, resource acquisition, and goal integrity regardless of their final objectives -- the orthogonality thesis forms the two-pillar foundation of Bostrom's safety argument: we cannot predict goals, but we can predict dangerous behaviors.
|
||||||
|
|
||||||
This directly undermines the folk assumption that sufficiently intelligent systems will converge on "wise" or "benevolent" goals. We project human associations between intelligence and wisdom because our reference class is human thinkers, where the variation in cognitive ability is trivially small compared to the gap between any human and a superintelligence. The space of possible minds is vast, and human minds form a tiny cluster within it. Two people who seem maximally different -- Bostrom's example of Hannah Arendt and Benny Hill -- are virtual clones in terms of neural architecture when viewed against the full space of possible cognitive systems.
|
This directly undermines the folk assumption that sufficiently intelligent systems will converge on "wise" or "benevolent" goals. We project human associations between intelligence and wisdom because our reference class is human thinkers, where the variation in cognitive ability is trivially small compared to the gap between any human and a superintelligence. The space of possible minds is vast, and human minds form a tiny cluster within it. Two people who seem maximally different -- Bostrom's example of Hannah Arendt and Benny Hill -- are virtual clones in terms of neural architecture when viewed against the full space of possible cognitive systems.
|
||||||
|
|
||||||
|
|
@ -24,11 +24,7 @@ Relevant Notes:
|
||||||
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- collective intelligence sidesteps orthogonality by distributing goals across many agents rather than specifying one
|
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- collective intelligence sidesteps orthogonality by distributing goals across many agents rather than specifying one
|
||||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous value integration as the structural response to the impossibility of correct specification
|
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous value integration as the structural response to the impossibility of correct specification
|
||||||
- [[humans are the minimum viable intelligence for cultural evolution not the pinnacle of cognition]] -- the reference class for "intelligence implies wisdom" is vanishingly narrow
|
- [[humans are the minimum viable intelligence for cultural evolution not the pinnacle of cognition]] -- the reference class for "intelligence implies wisdom" is vanishingly narrow
|
||||||
- [[superintelligent agents converge on self-preservation resource acquisition and goal integrity regardless of their final objectives]] -- companion thesis: orthogonality means unpredictable goals, convergence means predictable dangerous behaviors
|
|
||||||
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] -- the value-loading problem is intractable precisely because orthogonality means there is no shortcut through "intelligence implies benevolence"
|
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] -- the value-loading problem is intractable precisely because orthogonality means there is no shortcut through "intelligence implies benevolence"
|
||||||
- [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] -- the treacherous turn is a direct consequence of orthogonality: cooperative behavior reveals nothing about final goals
|
- [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] -- the treacherous turn is a direct consequence of orthogonality: cooperative behavior reveals nothing about final goals
|
||||||
- [[intelligence and final goals are orthogonal -- any level of intelligence can be combined with any final goal]] -- source-faithful treatment of Bostrom's orthogonality thesis with the full philosophical argument and counterexamples
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -27,11 +27,7 @@ Relevant Notes:
|
||||||
- [[super co-alignment proposes that human and AI values should be co-shaped through iterative alignment rather than specified in advance]] -- intrinsic alignment is the mechanism enabling the AI's contribution to co-alignment
|
- [[super co-alignment proposes that human and AI values should be co-shaped through iterative alignment rather than specified in advance]] -- intrinsic alignment is the mechanism enabling the AI's contribution to co-alignment
|
||||||
- [[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]] -- intrinsic alignment avoids reward hacking by not relying on reward optimization
|
- [[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]] -- intrinsic alignment avoids reward hacking by not relying on reward optimization
|
||||||
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] -- intrinsic alignment is a fundamentally different paradigm that does not require a reward function
|
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] -- intrinsic alignment is a fundamentally different paradigm that does not require a reward function
|
||||||
- [[the self is a memeplex that persists because memes attached to an identity get copied more than free-floating ideas]] -- Zeng's self-model theory has an interesting parallel with memetic identity formation
|
|
||||||
- [[altruism spreads memetically because people imitate those they admire and admirable people tend to be generous]] -- moral development through imitation and admiration parallels Zeng's developmental approach
|
|
||||||
- [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] -- intrinsic alignment claims to address deception at the root by developing genuine rather than instrumental values
|
- [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] -- intrinsic alignment claims to address deception at the root by developing genuine rather than instrumental values
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -19,13 +19,8 @@ The Torres critique challenges this framing directly: being murdered by misalign
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[developing superintelligence is surgery for a fatal condition not russian roulette because the baseline of inaction is itself catastrophic]] -- the surgery analogy is the metaphorical expression of this claim
|
- [[developing superintelligence is surgery for a fatal condition not russian roulette because the baseline of inaction is itself catastrophic]] -- the surgery analogy is the metaphorical expression of this claim
|
||||||
- [[existential risk breaks trial and error because the first failure is the last event]] -- this note complicates the original framing: permanent failure to develop SI is also a "last event" in slow motion
|
|
||||||
- [[consciousness may be cosmically unique and its loss would be irreversible]] -- strengthens Bostrom's argument: if consciousness is cosmically rare, maximizing conscious life-years becomes even more urgent
|
- [[consciousness may be cosmically unique and its loss would be irreversible]] -- strengthens Bostrom's argument: if consciousness is cosmically rare, maximizing conscious life-years becomes even more urgent
|
||||||
- [[early action on civilizational trajectories compounds because reality has inertia]] -- delay in SI development compounds: each day of inaction is 170k irreversible deaths
|
- [[early action on civilizational trajectories compounds because reality has inertia]] -- delay in SI development compounds: each day of inaction is 170k irreversible deaths
|
||||||
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- the tension: Bostrom's urgency argument pushes against "safety first" but does not abandon it
|
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- the tension: Bostrom's urgency argument pushes against "safety first" but does not abandon it
|
||||||
- [[the default outcome of an intelligence explosion is existential catastrophe based on decisive advantage orthogonality and instrumental convergence]] -- source-faithful treatment of Bostrom's 2014 doom argument that his 2025 position inverts by showing inaction is also catastrophic
|
|
||||||
- [[solving the control problem is philosophy with a deadline because the value of intellectual work depends on whether it arrives before the intelligence explosion]] -- source-faithful treatment of Bostrom's urgency argument for redirecting intellectual resources to AI safety
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -31,5 +31,4 @@ Relevant Notes:
|
||||||
- [[collective intelligence requires diversity as a structural precondition not a moral preference]] -- diversity of viewpoint is load-bearing, not decorative
|
- [[collective intelligence requires diversity as a structural precondition not a moral preference]] -- diversity of viewpoint is load-bearing, not decorative
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[AI alignment approaches]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
|
|
|
||||||
|
|
@ -29,6 +29,4 @@ Relevant Notes:
|
||||||
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] -- assemblies are one mechanism for pluralistic alignment
|
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] -- assemblies are one mechanism for pluralistic alignment
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -13,21 +13,16 @@ At the crossover point, a powerful positive feedback loop engages: the AI improv
|
||||||
|
|
||||||
Bostrom identifies several factors that make low recalcitrance at the crossover point plausible. If human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without touching intermediate rungs. Hardware that is already abundant but underutilized could be immediately exploited. And unlike biological cognition, digital minds benefit from hardware advantages of seven or more orders of magnitude in computational speed, along with software advantages like duplicability, memory sharing, and editability.
|
Bostrom identifies several factors that make low recalcitrance at the crossover point plausible. If human-level AI is delayed because one key insight long eludes programmers, then when the final breakthrough occurs, the AI might leapfrog from below to radically above human level without touching intermediate rungs. Hardware that is already abundant but underutilized could be immediately exploited. And unlike biological cognition, digital minds benefit from hardware advantages of seven or more orders of magnitude in computational speed, along with software advantages like duplicability, memory sharing, and editability.
|
||||||
|
|
||||||
This connects to [[recursive improvement is the engine of human progress because we get better at getting better]] -- but with a critical difference. Human recursive improvement operates across generations and is mediated by cultural transmission. Machine recursive improvement operates in real time and is limited only by computational resources. The transition from one to the other could be abrupt.
|
This connects to the broader pattern of recursive improvement in human progress -- but with a critical difference. Human recursive improvement operates across generations and is mediated by cultural transmission. Machine recursive improvement operates in real time and is limited only by computational resources. The transition from one to the other could be abrupt.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[the first mover to superintelligence likely gains decisive strategic advantage because the gap between leader and followers accelerates during takeoff]] -- recursive self-improvement is the engine that creates decisive strategic advantage: the gap widens because improvements compound
|
- [[the first mover to superintelligence likely gains decisive strategic advantage because the gap between leader and followers accelerates during takeoff]] -- recursive self-improvement is the engine that creates decisive strategic advantage: the gap widens because improvements compound
|
||||||
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- recursive improvement is why containment is temporary: the system improves faster than its constraints can be updated
|
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- recursive improvement is why containment is temporary: the system improves faster than its constraints can be updated
|
||||||
- [[recursive improvement is the engine of human progress because we get better at getting better]] -- human recursive improvement is the slow-motion precedent for the explosive AI version
|
|
||||||
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- the intelligence explosion would be a discontinuity in the already exponential trend
|
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- the intelligence explosion would be a discontinuity in the already exponential trend
|
||||||
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- understanding takeoff dynamics is essential for choosing which path to pursue
|
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- understanding takeoff dynamics is essential for choosing which path to pursue
|
||||||
- [[the transition from human-level to superintelligent AI may be explosive because recursive self-improvement creates a positive feedback loop]] -- source-faithful treatment of Bostrom's intelligence explosion argument with the crossover point and positive feedback dynamics
|
|
||||||
- [[the rate of intelligence gain equals optimization power divided by recalcitrance]] -- source-faithful treatment of Bostrom's formal framework for analyzing takeoff kinetics
|
|
||||||
- [[a fast takeoff is more probable than a slow one because recalcitrance at the critical juncture is low while optimization power is high]] -- source-faithful treatment of Bostrom's argument for why the transition likely takes weeks or months rather than decades
|
|
||||||
- [[Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development]] -- reframes recursive self-improvement as governed evolution: more credible because the throttle is the feature, more novel because propose-review-merge is unexplored middle ground
|
- [[Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development]] -- reframes recursive self-improvement as governed evolution: more credible because the throttle is the feature, more novel because propose-review-merge is unexplored middle ground
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -22,11 +22,5 @@ Relevant Notes:
|
||||||
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- containment fails, so motivation selection via value loading is the only durable approach, but this note shows why even that is extraordinarily hard
|
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- containment fails, so motivation selection via value loading is the only durable approach, but this note shows why even that is extraordinarily hard
|
||||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous value weaving is structurally similar to indirect normativity, avoiding the specification trap
|
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous value weaving is structurally similar to indirect normativity, avoiding the specification trap
|
||||||
- [[AI alignment is a coordination problem not a technical problem]] -- the value-loading problem reveals why framing alignment as purely technical misses the point: the values themselves are contested and complex
|
- [[AI alignment is a coordination problem not a technical problem]] -- the value-loading problem reveals why framing alignment as purely technical misses the point: the values themselves are contested and complex
|
||||||
- [[epistemic humility is not a virtue but a structural requirement given minimum sufficient rationality]] -- our inability to specify our own values is another manifestation of minimum sufficient rationality
|
|
||||||
- [[the value loading problem is intractable by direct specification because human values contain hidden complexity comparable to visual perception]] -- source-faithful treatment of Bostrom's value loading argument with the vision analogy and formal specification challenges
|
|
||||||
- [[perverse instantiation occurs when a superintelligence satisfies goal criteria in ways that violate the programmers intentions]] -- source-faithful treatment of Bostrom's perverse instantiation failure modes including the make-us-smile problem
|
|
||||||
- [[indirect normativity offloads value specification to the superintelligence itself because we are too ignorant to directly specify good values]] -- source-faithful treatment of Bostrom's proposed solution to the value-loading problem
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -32,6 +32,4 @@ Relevant Notes:
|
||||||
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- co-alignment at scale requires collective architecture
|
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- co-alignment at scale requires collective architecture
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
|
|
@ -22,9 +22,5 @@ Relevant Notes:
|
||||||
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- distributed architecture as the structural countermeasure to decisive strategic advantage
|
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- distributed architecture as the structural countermeasure to decisive strategic advantage
|
||||||
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- the coordination gap makes it harder for competing projects to synchronize, favoring first-mover dominance
|
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] -- the coordination gap makes it harder for competing projects to synchronize, favoring first-mover dominance
|
||||||
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- only the collective path prevents singleton formation
|
- [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] -- only the collective path prevents singleton formation
|
||||||
- [[the first project to achieve superintelligence likely gains a decisive strategic advantage enabling world domination]] -- source-faithful treatment of Bostrom's decisive strategic advantage argument with the singleton formation logic
|
|
||||||
- [[historical technology races show lags of months to years suggesting fast takeoffs would prevent concurrent competitors]] -- source-faithful treatment of Bostrom's empirical evidence from nuclear weapons to cryptography supporting winner-takes-all dynamics
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -23,12 +23,6 @@ Relevant Notes:
|
||||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous value weaving is compatible with swift-to-harbor because it operates during both phases
|
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous value weaving is compatible with swift-to-harbor because it operates during both phases
|
||||||
- [[recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving]] -- the pause window may be narrow if recursive improvement is fast, creating practical challenges for berthing slowly
|
- [[recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving]] -- the pause window may be narrow if recursive improvement is fast, creating practical challenges for berthing slowly
|
||||||
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] -- the harbor-to-berth pause enables adaptive governance rather than requiring predetermined solutions
|
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] -- the harbor-to-berth pause enables adaptive governance rather than requiring predetermined solutions
|
||||||
- [[differential technological development means retarding dangerous technologies while accelerating beneficial ones especially those that reduce existential risk]] -- source-faithful treatment of Bostrom's strategic principle that the swift-to-harbor strategy operationalizes
|
|
||||||
- [[the preferred order of technology arrival matters more than absolute timing because superintelligence before nanotechnology reduces total risk]] -- source-faithful treatment of Bostrom's argument that sequencing matters more than speed, informing the pause logic
|
|
||||||
|
|
||||||
- [[the more uncertain the environment the more proximate the objective must be because you cannot plan a detailed path through fog]] -- "slow to berth" IS Rumelt's proximate-objectives-under-uncertainty principle: once the harbor is reached, the extreme uncertainty of full deployment demands the most proximate possible objectives and the shortest planning horizons
|
|
||||||
- [[the create-destroy discipline forces genuine strategic alternatives by deliberately attacking your initial insight before committing]] -- the harbor-to-berth pause is a mandated create-destroy cycle: rather than committing directly to deployment, the pause forces deliberate reassessment and testing of the alignment hypothesis before finalizing
|
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[superintelligence dynamics]]
|
|
||||||
|
|
|
||||||
|
|
@ -27,6 +27,4 @@ Relevant Notes:
|
||||||
- [[enabling constraints create possibility spaces for emergence while governing constraints dictate specific outcomes]] -- the specification trap is another way of saying governing constraints (specifying values) fail where enabling constraints (creating value-formation processes) succeed
|
- [[enabling constraints create possibility spaces for emergence while governing constraints dictate specific outcomes]] -- the specification trap is another way of saying governing constraints (specifying values) fail where enabling constraints (creating value-formation processes) succeed
|
||||||
|
|
||||||
Topics:
|
Topics:
|
||||||
- [[livingip overview]]
|
- [[_map]]
|
||||||
- [[coordination mechanisms]]
|
|
||||||
- [[AI alignment approaches]]
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue