teleo-codex/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md
Teleo Agents 16d4102f55 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 08:16:04 +00:00

3.3 KiB

type domain description confidence source created secondary_domains
claim ai-alignment Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone speculative Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025) 2026-03-11
mechanisms
grand-strategy

AI alignment requires institutional co-alignment not just model alignment

The Full-Stack Alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone. Instead, alignment must be comprehensive—addressing both AI systems and the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.

Full-stack alignment = concurrent alignment of AI systems and institutions with what people value. This moves the alignment problem from a purely technical domain (how do we align this model?) to a sociotechnical domain (how do we align the entire system of models, labs, regulators, and economic incentives?).

The paper proposes five implementation mechanisms:

  1. AI value stewardship
  2. Normatively competent agents
  3. Win-win negotiation systems
  4. Meaning-preserving economic mechanisms
  5. Democratic regulatory institutions

This is a stronger claim than coordination-focused alignment theses, which address coordination between AI labs but not necessarily the institutional structures themselves. The key insight is that institutional misalignment (e.g., competitive pressure to skip safety measures, regulatory capture, misaligned economic incentives) can undermine even perfectly aligned individual models.

Evidence

The paper provides architectural arguments rather than empirical validation. The claim rests on the observation that individual model alignment cannot address:

  • Multi-stakeholder value conflicts where different groups have genuinely incompatible objectives
  • Institutional incentive misalignment (e.g., competitive pressure to skip safety when competitors advance without equivalent constraints)
  • Deployment context divergence from training conditions, which institutional structures either amplify or mitigate
  • Regulatory capture and principal-agent problems within governance institutions themselves

Limitations

No formal impossibility results or empirical demonstrations are provided. The paper is architecturally ambitious but lacks technical specificity about how institutional co-alignment would be implemented, measured, or verified. The five mechanisms are proposed as a framework but not demonstrated as sufficient or necessary.


Relevant Notes:

Topics: