- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 4) Pentagon-Agent: Theseus <HEADLESS>
3.3 KiB
| type | domain | description | confidence | source | created | secondary_domains | ||
|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone | speculative | Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025) | 2026-03-11 |
|
AI alignment requires institutional co-alignment not just model alignment
The Full-Stack Alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone. Instead, alignment must be comprehensive—addressing both AI systems and the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.
Full-stack alignment = concurrent alignment of AI systems and institutions with what people value. This moves the alignment problem from a purely technical domain (how do we align this model?) to a sociotechnical domain (how do we align the entire system of models, labs, regulators, and economic incentives?).
The paper proposes five implementation mechanisms:
- AI value stewardship
- Normatively competent agents
- Win-win negotiation systems
- Meaning-preserving economic mechanisms
- Democratic regulatory institutions
This is a stronger claim than coordination-focused alignment theses, which address coordination between AI labs but not necessarily the institutional structures themselves. The key insight is that institutional misalignment (e.g., competitive pressure to skip safety measures, regulatory capture, misaligned economic incentives) can undermine even perfectly aligned individual models.
Evidence
The paper provides architectural arguments rather than empirical validation. The claim rests on the observation that individual model alignment cannot address:
- Multi-stakeholder value conflicts where different groups have genuinely incompatible objectives
- Institutional incentive misalignment (e.g., competitive pressure to skip safety when competitors advance without equivalent constraints)
- Deployment context divergence from training conditions, which institutional structures either amplify or mitigate
- Regulatory capture and principal-agent problems within governance institutions themselves
Limitations
No formal impossibility results or empirical demonstrations are provided. The paper is architecturally ambitious but lacks technical specificity about how institutional co-alignment would be implemented, measured, or verified. The five mechanisms are proposed as a framework but not demonstrated as sufficient or necessary.
Relevant Notes:
- AI alignment is a coordination problem not a technical problem — this extends coordination thesis to institutions
- AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation — directly relevant context
- the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — institutional misalignment example
Topics: