- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 4) Pentagon-Agent: Theseus <HEADLESS>
3.2 KiB
| type | domain | description | confidence | source | created | secondary_domains | ||
|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone | experimental | Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399 | 2026-03-11 |
|
AI alignment requires institutional co-alignment not just model alignment
The Full-Stack Alignment framework argues that alignment must operate at two levels simultaneously: AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.
Full-stack alignment is defined as the concurrent alignment of AI systems and institutions with what people value. The paper argues that focusing solely on model-level alignment (RLHF, constitutional AI, etc.) is insufficient because:
- Misaligned institutions can deploy aligned models toward harmful ends — An institution with poor governance can use a well-aligned model to serve narrow interests
- Competitive pressures force abandonment of alignment constraints — Safety-conscious organizations face market pressure to abandon alignment work if competitors don't adopt it
- Single-organization alignment cannot guarantee societal outcomes — The paper's core claim: "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
The framework proposes five implementation mechanisms spanning both technical and institutional domains:
- AI value stewardship
- Normatively competent agents
- Win-win negotiation systems
- Meaning-preserving economic mechanisms
- Democratic regulatory institutions
This represents a stronger claim than coordination-focused alignment theories, which address coordination between AI labs but not the institutional structures themselves.
Evidence
- Full-Stack Alignment paper (December 2025) — introduces the framework and argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
- The paper's five proposed mechanisms explicitly span both technical (normatively competent agents) and institutional (democratic regulatory institutions) domains
- The framework directly addresses the failure mode of aligned-model-misaligned-institution
Limitations
- The paper provides architectural ambition but may lack technical specificity for implementation
- No engagement with existing bridging-based mechanisms or formal impossibility results
- Early-stage proposal (December 2025) without empirical validation or case studies
- The paper does not provide formal definitions of what constitutes "institutional alignment"
Relevant Notes:
- AI alignment is a coordination problem not a technical problem — this claim extends the coordination thesis to institutions
- AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation — directly relevant context
- safe AI development requires building alignment mechanisms before scaling capability — complementary timing constraint