teleo-codex/domains/ai-alignment/beneficial-ai-outcomes-require-concurrent-alignment-of-systems-and-institutions-not-model-alignment-alone.md
Teleo Agents aa8a9b4ca8 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 05:42:51 +00:00

2.9 KiB

type domain description confidence source created secondary_domains
claim ai-alignment Full-stack alignment requires concurrent co-alignment of AI systems and institutions, not model alignment alone experimental Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arxiv.org/abs/2512.03399), December 2025 2026-03-11
mechanisms
grand-strategy

Beneficial AI outcomes require concurrent alignment of systems and institutions, not model alignment alone

The full-stack alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" in isolation. Alignment must address both AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.

The paper proposes five implementation mechanisms for institutional co-alignment:

  1. AI value stewardship
  2. Normatively competent agents
  3. Win-win negotiation systems
  4. Meaning-preserving economic mechanisms
  5. Democratic regulatory institutions

The core argument: even perfectly aligned individual AI systems can produce harmful outcomes through misaligned deployment contexts, competitive dynamics between organizations, or governance failures at the institutional level. Alignment is therefore a system-level coordination problem where institutional structures must co-evolve with AI capabilities.

Evidence

The paper provides architectural reasoning grounded in the observation that institutional incentives often conflict with individual system alignment. However, the framework lacks empirical validation—no deployment data, no formal verification, and no engagement with existing technical alignment approaches (RLHF, constitutional AI, bridging-based mechanisms). The five mechanisms are proposed as necessary but remain underspecified technically.

This is a conceptually ambitious framework from a recent paper (December 2025) that extends rather than replaces existing alignment work.


Relevant Notes:

Topics: