m3taversal 1398aa193f theseus: archive 9 primary sources for alignment research program phases 1-3

- What: Source archives for key works by Yudkowsky (AGI Ruin, No Fire Alarm),
  Christiano (What Failure Looks Like, AI Safety via Debate, IDA, ELK),
  Russell (Human Compatible), Drexler (CAIS), and Bostrom (Vulnerable World Hypothesis)
- Why: m3ta directive to ingest primary source materials for alignment researchers.
  These 9 texts are the foundational works underlying claims extracted in PRs #2414,
  #2418, and #2419. Source archives ensure agents can reference primary texts without
  re-fetching and content persists if URLs go down.
- Connections: All 9 sources are marked as processed with claims_extracted linking
  to the specific KB claims they produced.

Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>

2026-04-05 23:50:36 +01:00

6.6 KiB

Raw Blame History

type

title

author

url

date

domain

intake_tier

rationale

proposed_by

format

status

processed_by

processed_date

claims_extracted

enrichments

Reframing Superintelligence: Comprehensive AI Services as General Intelligence

Published January 2019 as FHI Technical Report #2019-1 by K. Eric Drexler (Future of Humanity Institute, Oxford). 210-page report arguing that the standard model of superintelligence as a unified, agentic system is both misleading and unnecessarily dangerous.

The Core Reframing

Drexler argues that most AI safety discourse assumes a specific architecture — a monolithic agent with general goals, world models, and long-horizon planning. This assumption drives most alignment concerns (instrumental convergence, deceptive alignment, corrigibility challenges). But this architecture is not necessary for superintelligent-level performance.

The alternative: Comprehensive AI Services (CAIS). Instead of one superintelligent agent, build many specialized, task-specific AI services that collectively provide any capability a unified system could deliver.

Key Arguments

Services vs. Agents

Property	Agent (standard model)	Service (CAIS)
Goals	General, persistent	Task-specific, ephemeral
World model	Comprehensive	Task-relevant only
Planning horizon	Long-term, strategic	Short-term, bounded
Identity	Persistent self	Stateless per-invocation
Instrumental convergence	Strong	Weak (no persistent goals)

The safety advantage: services don't develop instrumental goals (self-preservation, resource acquisition, goal stability) because they don't have persistent objectives to preserve. Each service completes its task and terminates.

How Services Achieve General Intelligence

Composition: Complex tasks are decomposed into simpler subtasks, each handled by a specialized service
Orchestration: A (non-agentic) coordination layer routes tasks to appropriate services
Recursive capability: The set of services can include the service of developing new services
Comprehensiveness: Asymptotically, the service collective can handle any task a unified agent could

The Service-Development Service

A critical point: CAIS includes the ability to develop new services, guided by concrete human goals and informed by strong models of human approval. This is not a monolithic self-improving agent — it's a development process where:

Humans specify what new capability is needed
A service-development service creates it
The new service is tested, validated, and deployed
Each step involves human oversight

Why CAIS Avoids Standard Alignment Problems

No instrumental convergence: Services don't have persistent goals, so they don't develop power-seeking behavior
No deceptive alignment: Services are too narrow to develop strategic deception
Natural corrigibility: Services that complete tasks and terminate don't resist shutdown
Bounded impact: Each service has limited scope and duration
Oversight-compatible: The decomposition into subtasks creates natural checkpoints for human oversight

The Emergent Agency Objection

The strongest objection to CAIS (and the one that produced a CHALLENGE claim in our KB): sufficiently complex service meshes may exhibit de facto unified agency even though no individual component possesses it.

Complex service interactions could create persistent goals at the system level
Optimization of service coordination could effectively create a planning horizon
Information sharing between services could constitute a de facto world model
The service collective might resist modifications that reduce its collective capability

This is the "emergent agency from service composition" problem — distinct from both monolithic AGI risk (Yudkowsky) and competitive multi-agent dynamics (multipolar instability).

Reception and Impact

Warmly received by some in the alignment community (especially those building modular AI systems)
Critiqued by Yudkowsky and others who argue that economic competition will push toward agentic, autonomous systems regardless of architectural preferences
DeepMind's "Patchwork AGI" concept (2025) independently arrived at similar conclusions, validating the architectural intuition
Most directly relevant to multi-agent AI systems, including our own collective architecture

Significance for Teleo KB

CAIS is the closest published framework to our collective superintelligence thesis, published six years before our architecture was designed. The key questions for our KB:

Where does our architecture extend beyond CAIS? (We use persistent agents with identity and memory, which CAIS deliberately avoids)
Where are we vulnerable to the same critiques? (The emergent agency objection applies to us)
Is our architecture actually safer than CAIS? (Our agents have persistent goals, which CAIS argues against)

Understanding exactly where we overlap with and diverge from CAIS is essential for positioning our thesis in the broader alignment landscape.

6.6 KiB Raw Blame History