teleo-codex/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md
Teleo Agents 4dfe98112c theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 07:01:07 +00:00

46 lines
3.2 KiB
Markdown

---
type: claim
domain: ai-alignment
description: "Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone"
confidence: experimental
source: "Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399"
created: 2026-03-11
secondary_domains: [mechanisms, grand-strategy]
---
# AI alignment requires institutional co-alignment not just model alignment
The Full-Stack Alignment framework argues that alignment must operate at two levels simultaneously: AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.
**Full-stack alignment** is defined as the concurrent alignment of AI systems and institutions with what people value. The paper argues that focusing solely on model-level alignment (RLHF, constitutional AI, etc.) is insufficient because:
1. **Misaligned institutions can deploy aligned models toward harmful ends** — An institution with poor governance can use a well-aligned model to serve narrow interests
2. **Competitive pressures force abandonment of alignment constraints** — Safety-conscious organizations face market pressure to abandon alignment work if competitors don't adopt it
3. **Single-organization alignment cannot guarantee societal outcomes** — The paper's core claim: "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
The framework proposes five implementation mechanisms spanning both technical and institutional domains:
1. AI value stewardship
2. Normatively competent agents
3. Win-win negotiation systems
4. Meaning-preserving economic mechanisms
5. Democratic regulatory institutions
This represents a stronger claim than coordination-focused alignment theories, which address coordination between AI labs but not the institutional structures themselves.
## Evidence
- Full-Stack Alignment paper (December 2025) — introduces the framework and argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
- The paper's five proposed mechanisms explicitly span both technical (normatively competent agents) and institutional (democratic regulatory institutions) domains
- The framework directly addresses the failure mode of aligned-model-misaligned-institution
## Limitations
- The paper provides architectural ambition but may lack technical specificity for implementation
- No engagement with existing bridging-based mechanisms or formal impossibility results
- Early-stage proposal (December 2025) without empirical validation or case studies
- The paper does not provide formal definitions of what constitutes "institutional alignment"
---
Relevant Notes:
- [[AI alignment is a coordination problem not a technical problem]] — this claim extends the coordination thesis to institutions
- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — directly relevant context
- [[safe AI development requires building alignment mechanisms before scaling capability]] — complementary timing constraint