Teleo Agents 16d4102f55 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 08:16:04 +00:00

3.3 KiB

Raw Blame History

type

domain

description

confidence

source

created

secondary_domains

claim

ai-alignment

Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone

speculative

Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)

2026-03-11

mechanisms

grand-strategy

AI alignment requires institutional co-alignment not just model alignment

The Full-Stack Alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone. Instead, alignment must be comprehensive—addressing both AI systems and the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.

Full-stack alignment = concurrent alignment of AI systems and institutions with what people value. This moves the alignment problem from a purely technical domain (how do we align this model?) to a sociotechnical domain (how do we align the entire system of models, labs, regulators, and economic incentives?).

The paper proposes five implementation mechanisms:

AI value stewardship
Normatively competent agents
Win-win negotiation systems
Meaning-preserving economic mechanisms
Democratic regulatory institutions

This is a stronger claim than coordination-focused alignment theses, which address coordination between AI labs but not necessarily the institutional structures themselves. The key insight is that institutional misalignment (e.g., competitive pressure to skip safety measures, regulatory capture, misaligned economic incentives) can undermine even perfectly aligned individual models.

Evidence

The paper provides architectural arguments rather than empirical validation. The claim rests on the observation that individual model alignment cannot address:

Multi-stakeholder value conflicts where different groups have genuinely incompatible objectives
Institutional incentive misalignment (e.g., competitive pressure to skip safety when competitors advance without equivalent constraints)
Deployment context divergence from training conditions, which institutional structures either amplify or mitigate
Regulatory capture and principal-agent problems within governance institutions themselves

Limitations

No formal impossibility results or empirical demonstrations are provided. The paper is architecturally ambitious but lacks technical specificity about how institutional co-alignment would be implemented, measured, or verified. The five mechanisms are proposed as a framework but not demonstrated as sufficient or necessary.

Relevant Notes:

AI alignment is a coordination problem not a technical problem — this extends coordination thesis to institutions
AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation — directly relevant context
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — institutional misalignment example

Topics:

3.3 KiB Raw Blame History

AI alignment requires institutional co-alignment not just model alignment

Evidence

Limitations

3.3 KiB

Raw Blame History