auto-fix: strip 18 broken wiki links
Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base.
This commit is contained in:
parent
1b2ba391ee
commit
a91ae40dae
3 changed files with 18 additions and 18 deletions
|
|
@ -54,11 +54,11 @@ This shifts the problem from "how do we specify the right objective" to "how do
|
|||
---
|
||||
|
||||
**Related claims:**
|
||||
- [[safe AI development requires building alignment mechanisms before scaling capability.md]]
|
||||
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md]]
|
||||
- [[AI alignment is a coordination problem not a technical problem.md]]
|
||||
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md]]
|
||||
- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm.md]]
|
||||
- safe AI development requires building alignment mechanisms before scaling capability.md
|
||||
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md
|
||||
- AI alignment is a coordination problem not a technical problem.md
|
||||
- pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md
|
||||
- designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm.md
|
||||
|
||||
**Topics:**
|
||||
- [[domains/ai-alignment/_map]]
|
||||
- domains/ai-alignment/_map
|
||||
|
|
|
|||
|
|
@ -49,11 +49,11 @@ Full paper not accessed (paywalled). Cannot verify:
|
|||
---
|
||||
|
||||
**Related claims:**
|
||||
- [[safe AI development requires building alignment mechanisms before scaling capability.md]]
|
||||
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md]]
|
||||
- [[AI alignment is a coordination problem not a technical problem.md]]
|
||||
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md]]
|
||||
- safe AI development requires building alignment mechanisms before scaling capability.md
|
||||
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md
|
||||
- AI alignment is a coordination problem not a technical problem.md
|
||||
- pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md
|
||||
|
||||
**Topics:**
|
||||
- [[domains/ai-alignment/_map]]
|
||||
- [[domains/critical-systems/_map]]
|
||||
- domains/ai-alignment/_map
|
||||
- domains/critical-systems/_map
|
||||
|
|
|
|||
|
|
@ -13,11 +13,11 @@ The standard AI development pattern scales capability first and attempts safety
|
|||
|
||||
The grant application identifies three concrete risks that make this sequencing non-optional: knowledge aggregation could surface dangerous combinations of individually safe information, the incentive system could be gamed, and the network could develop emergent properties that resist understanding. Each risk is easier to detect and contain while the system operates in non-sensitive domains. Since [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]], the safety-first approach gives the human-in-the-loop mechanisms time to mature before the stakes rise. Governance muscles are built on easier problems before being asked to handle harder ones.
|
||||
|
||||
This phased approach is also a practical response to the observation that since [[existential risk breaks trial and error because the first failure is the last event]], there is no opportunity to iterate on safety after a catastrophic failure. You must get safety right on the first deployment in high-stakes domains, which means practicing in low-stakes domains first. The goal framework remains permanently open to revision at every stage, making the system's values a living document rather than a locked specification.
|
||||
This phased approach is also a practical response to the observation that since existential risk breaks trial and error because the first failure is the last event, there is no opportunity to iterate on safety after a catastrophic failure. You must get safety right on the first deployment in high-stakes domains, which means practicing in low-stakes domains first. The goal framework remains permanently open to revision at every stage, making the system's values a living document rather than a locked specification.
|
||||
|
||||
|
||||
### Additional Evidence (challenge)
|
||||
*Source: [[2026-02-00-anthropic-rsp-rollback]] | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
|
||||
*Source: 2026-02-00-anthropic-rsp-rollback | Added: 2026-03-10 | Extractor: anthropic/claude-sonnet-4.5*
|
||||
|
||||
Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions.
|
||||
|
||||
|
|
@ -34,14 +34,14 @@ Relevant Notes:
|
|||
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] -- Bostrom's analysis shows why motivation selection must precede capability scaling
|
||||
- [[recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving]] -- the explosive dynamics of takeoff mean alignment mechanisms cannot be retrofitted after the fact
|
||||
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- this note describes the development sequencing that allows that continuous weaving to mature
|
||||
- [[existential risk breaks trial and error because the first failure is the last event]] -- the urgency that makes safety-first sequencing non-optional
|
||||
- existential risk breaks trial and error because the first failure is the last event -- the urgency that makes safety-first sequencing non-optional
|
||||
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- the architecture within which this phased approach operates
|
||||
- [[knowledge aggregation creates novel risks when dangerous information combinations emerge from individually safe pieces]] -- one of the specific risks this phased approach is designed to contain
|
||||
- knowledge aggregation creates novel risks when dangerous information combinations emerge from individually safe pieces -- one of the specific risks this phased approach is designed to contain
|
||||
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] -- Bostrom's evolved position refines this: build adaptable alignment mechanisms, not rigid ones
|
||||
- [[the optimal SI development strategy is swift to harbor slow to berth moving fast to capability then pausing before full deployment]] -- Bostrom's timing model suggests building alignment in parallel with capability, then intensive verification during the pause
|
||||
|
||||
- [[proximate objectives resolve ambiguity by absorbing complexity so the organization faces a problem it can actually solve]] -- the phased safety-first approach IS a proximate objectives strategy: start in non-sensitive domains where alignment problems are tractable, build governance muscles, then tackle harder domains
|
||||
- [[the more uncertain the environment the more proximate the objective must be because you cannot plan a detailed path through fog]] -- AI alignment under deep uncertainty demands proximate objectives: you cannot pre-specify alignment for a system that does not yet exist, but you can build and test alignment mechanisms at each capability level
|
||||
- proximate objectives resolve ambiguity by absorbing complexity so the organization faces a problem it can actually solve -- the phased safety-first approach IS a proximate objectives strategy: start in non-sensitive domains where alignment problems are tractable, build governance muscles, then tackle harder domains
|
||||
- the more uncertain the environment the more proximate the objective must be because you cannot plan a detailed path through fog -- AI alignment under deep uncertainty demands proximate objectives: you cannot pre-specify alignment for a system that does not yet exist, but you can build and test alignment mechanisms at each capability level
|
||||
|
||||
Topics:
|
||||
- [[livingip overview]]
|
||||
|
|
|
|||
Loading…
Reference in a new issue