teleo-codex/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md
Teleo Agents 4dfe98112c theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 07:01:07 +00:00

3.2 KiB

type domain description confidence source created secondary_domains
claim ai-alignment Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone experimental Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399 2026-03-11
mechanisms
grand-strategy

AI alignment requires institutional co-alignment not just model alignment

The Full-Stack Alignment framework argues that alignment must operate at two levels simultaneously: AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.

Full-stack alignment is defined as the concurrent alignment of AI systems and institutions with what people value. The paper argues that focusing solely on model-level alignment (RLHF, constitutional AI, etc.) is insufficient because:

  1. Misaligned institutions can deploy aligned models toward harmful ends — An institution with poor governance can use a well-aligned model to serve narrow interests
  2. Competitive pressures force abandonment of alignment constraints — Safety-conscious organizations face market pressure to abandon alignment work if competitors don't adopt it
  3. Single-organization alignment cannot guarantee societal outcomes — The paper's core claim: "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone

The framework proposes five implementation mechanisms spanning both technical and institutional domains:

  1. AI value stewardship
  2. Normatively competent agents
  3. Win-win negotiation systems
  4. Meaning-preserving economic mechanisms
  5. Democratic regulatory institutions

This represents a stronger claim than coordination-focused alignment theories, which address coordination between AI labs but not the institutional structures themselves.

Evidence

  • Full-Stack Alignment paper (December 2025) — introduces the framework and argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
  • The paper's five proposed mechanisms explicitly span both technical (normatively competent agents) and institutional (democratic regulatory institutions) domains
  • The framework directly addresses the failure mode of aligned-model-misaligned-institution

Limitations

  • The paper provides architectural ambition but may lack technical specificity for implementation
  • No engagement with existing bridging-based mechanisms or formal impossibility results
  • Early-stage proposal (December 2025) without empirical validation or case studies
  • The paper does not provide formal definitions of what constitutes "institutional alignment"

Relevant Notes: