--- type: claim domain: ai-alignment description: "Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone" confidence: experimental source: "Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399" created: 2026-03-11 secondary_domains: [mechanisms, grand-strategy] --- # AI alignment requires institutional co-alignment not just model alignment The Full-Stack Alignment framework argues that alignment must operate at two levels simultaneously: AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders. **Full-stack alignment** is defined as the concurrent alignment of AI systems and institutions with what people value. The paper argues that focusing solely on model-level alignment (RLHF, constitutional AI, etc.) is insufficient because: 1. **Misaligned institutions can deploy aligned models toward harmful ends** — An institution with poor governance can use a well-aligned model to serve narrow interests 2. **Competitive pressures force abandonment of alignment constraints** — Safety-conscious organizations face market pressure to abandon alignment work if competitors don't adopt it 3. **Single-organization alignment cannot guarantee societal outcomes** — The paper's core claim: "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone The framework proposes five implementation mechanisms spanning both technical and institutional domains: 1. AI value stewardship 2. Normatively competent agents 3. Win-win negotiation systems 4. Meaning-preserving economic mechanisms 5. Democratic regulatory institutions This represents a stronger claim than coordination-focused alignment theories, which address coordination between AI labs but not the institutional structures themselves. ## Evidence - Full-Stack Alignment paper (December 2025) — introduces the framework and argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone - The paper's five proposed mechanisms explicitly span both technical (normatively competent agents) and institutional (democratic regulatory institutions) domains - The framework directly addresses the failure mode of aligned-model-misaligned-institution ## Limitations - The paper provides architectural ambition but may lack technical specificity for implementation - No engagement with existing bridging-based mechanisms or formal impossibility results - Early-stage proposal (December 2025) without empirical validation or case studies - The paper does not provide formal definitions of what constitutes "institutional alignment" --- Relevant Notes: - [[AI alignment is a coordination problem not a technical problem]] — this claim extends the coordination thesis to institutions - [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — directly relevant context - [[safe AI development requires building alignment mechanisms before scaling capability]] — complementary timing constraint