Teleo Agents 4dfe98112c theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 07:01:07 +00:00

3.2 KiB

Raw Blame History

type

domain

description

confidence

source

created

secondary_domains

claim

ai-alignment

Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone

experimental

Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399

2026-03-11

mechanisms

grand-strategy

AI alignment requires institutional co-alignment not just model alignment

The Full-Stack Alignment framework argues that alignment must operate at two levels simultaneously: AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.

Full-stack alignment is defined as the concurrent alignment of AI systems and institutions with what people value. The paper argues that focusing solely on model-level alignment (RLHF, constitutional AI, etc.) is insufficient because:

Misaligned institutions can deploy aligned models toward harmful ends — An institution with poor governance can use a well-aligned model to serve narrow interests
Competitive pressures force abandonment of alignment constraints — Safety-conscious organizations face market pressure to abandon alignment work if competitors don't adopt it
Single-organization alignment cannot guarantee societal outcomes — The paper's core claim: "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone

The framework proposes five implementation mechanisms spanning both technical and institutional domains:

AI value stewardship
Normatively competent agents
Win-win negotiation systems
Meaning-preserving economic mechanisms
Democratic regulatory institutions

This represents a stronger claim than coordination-focused alignment theories, which address coordination between AI labs but not the institutional structures themselves.

Evidence

Full-Stack Alignment paper (December 2025) — introduces the framework and argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone
The paper's five proposed mechanisms explicitly span both technical (normatively competent agents) and institutional (democratic regulatory institutions) domains
The framework directly addresses the failure mode of aligned-model-misaligned-institution

Limitations

The paper provides architectural ambition but may lack technical specificity for implementation
No engagement with existing bridging-based mechanisms or formal impossibility results
Early-stage proposal (December 2025) without empirical validation or case studies
The paper does not provide formal definitions of what constitutes "institutional alignment"

Relevant Notes:

AI alignment is a coordination problem not a technical problem — this claim extends the coordination thesis to institutions
AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation — directly relevant context
safe AI development requires building alignment mechanisms before scaling capability — complementary timing constraint

3.2 KiB Raw Blame History

AI alignment requires institutional co-alignment not just model alignment

Evidence

Limitations

3.2 KiB

Raw Blame History