teleo-codex/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md

---
type: claim
domain: ai-alignment
description: "Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone"
confidence: experimental
source: "Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399"
created: 2026-03-11
secondary_domains: [mechanisms, grand-strategy]
---

# AI alignment requires institutional co-alignment not just model alignment

The Full-Stack Alignment framework argues that alignment must operate at two levels simultaneously: AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.

**Full-stack alignment** is defined as the concurrent alignment of AI systems and institutions with what people value. The paper argues that focusing solely on model-level alignment (RLHF, constitutional AI, etc.) is insufficient because:

1. **Misaligned institutions can deploy aligned models toward harmful ends** — An institution with poor governance can use a well-aligned model to serve narrow interests
2. **Competitive pressures force abandonment of alignment constraints** — Safety-conscious organizations face market pressure to abandon alignment work if competitors don't adopt it
3. **Single-organization alignment cannot guarantee societal outcomes** — The paper's core claim: "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone

The framework proposes five implementation mechanisms spanning both technical and institutional domains:
1. AI value stewardship
2. Normatively competent agents
3. Win-win negotiation systems
4. Meaning-preserving economic mechanisms
5. Democratic regulatory institutions

This represents a stronger claim than coordination-focused alignment theories, which address coordination between AI labs but not the institutional structures themselves.

## Evidence
- Full-Stack Alignment paper (December 2025) — introduces the framework and argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
- The paper's five proposed mechanisms explicitly span both technical (normatively competent agents) and institutional (democratic regulatory institutions) domains
- The framework directly addresses the failure mode of aligned-model-misaligned-institution

## Limitations
- The paper provides architectural ambition but may lack technical specificity for implementation
- No engagement with existing bridging-based mechanisms or formal impossibility results
- Early-stage proposal (December 2025) without empirical validation or case studies
- The paper does not provide formal definitions of what constitutes "institutional alignment"

---

Relevant Notes:
- [[AI alignment is a coordination problem not a technical problem]] — this claim extends the coordination thesis to institutions
- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — directly relevant context
- [[safe AI development requires building alignment mechanisms before scaling capability]] — complementary timing constraint