Teleo Agents 6df32b57f4 auto-fix: address review feedback on 2025-12-00-fullstack-alignment-thick-models-value.md

- Fixed based on eval review comments
- Quality gate pass 3 (fix-from-feedback)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-11 20:02:12 +00:00

6.9 KiB

Raw Blame History

type

domain

secondary_domains

description

confidence

source

created

enrichments

claim

ai-alignment

mechanisms

grand-strategy

Full-stack alignment requires concurrent alignment of AI systems and governing institutions with thick models of value, not just individual model alignment

speculative

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arXiv 2512.03399, December 2025)

2026-03-11

AI alignment is a coordination problem not a technical problem

AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation

Beneficial AI outcomes require institutional co-alignment not just model alignment

The full-stack alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone. Instead, comprehensive alignment requires concurrent alignment of BOTH AI systems and the institutions that shape their development and deployment.

This extends the existing coordination-first thesis in a specific architectural way: the existing "AI alignment is a coordination problem" claim treats institutions (governments, regulatory bodies, economic structures) as the environment within which coordination between labs must occur. Full-stack alignment treats institutions themselves as alignment targets that must be redesigned and co-evolved alongside AI systems. The distinction is critical: coordination-first asks "how do competing actors align around AI development?"; full-stack alignment asks "how do we align the institutions that govern AI development?"

The framework proposes five implementation mechanisms:

AI value stewardship — institutional structures for preserving and transmitting human values
Normatively competent agents — AI systems that reason about values rather than optimize fixed objectives
Win-win negotiation systems — mechanisms for resolving stakeholder conflicts without zero-sum extraction
Meaning-preserving economic mechanisms — economic structures that preserve rather than flatten human meaning and purpose
Democratic regulatory institutions — governance structures that represent affected populations, not just developers or governments

The key claim: these five institutional mechanisms must be built concurrently with AI capability development, not sequentially after. This creates a fundamental timing problem: institutional redesign operates on decades-long timescales (Acemoglu's critical junctures are measured in decades); AI capability development operates on months-to-years timescales. The simultaneous co-alignment requirement may be structurally incoherent if the two processes cannot be synchronized.

Evidence

The paper presents this as a theoretical framework rather than an empirically validated approach. The five implementation mechanisms are proposed but lack formal specification, deployment evidence, or comparative analysis against alternative institutional designs. No working system exists that demonstrates institutional co-alignment at scale.

Challenges

Timescale incoherence (primary challenge): Institutional change (decades) and AI capability development (months) operate on fundamentally different timescales. The paper does not address whether simultaneous co-alignment is even temporally feasible, or whether the requirement should be sequential (build institutions first, then scale AI) or parallel (accept institutional lag). This is not merely a difficulty — it may be a structural impossibility if institutional redesign cannot be accelerated to match AI development velocity.

Coordination across jurisdictions: The framework does not specify how to coordinate institutional redesign across nations with conflicting interests, different legal systems, and competing strategic incentives. Full-stack alignment requires global institutional alignment, but the mechanisms for achieving this across sovereign states are unspecified. The paper does not engage with whether this is a coordination problem (solvable with better mechanisms) or a fundamental conflict of interest (unsolvable).

Irreducible value disagreement: The framework does not address how institutional co-alignment handles cases where different populations have genuinely incompatible enduring values, not just preference differences. Democratic regulatory institutions may amplify rather than resolve these conflicts. The paper assumes institutional redesign can accommodate value pluralism, but provides no mechanism for handling cases where pluralism is irreducible.

Operationalization gap: The paper does not provide concrete methods for implementing any of the five mechanisms. "AI value stewardship" and "meaning-preserving economic mechanisms" are conceptually interesting but lack specification sufficient for deployment. Without operationalization, the framework remains architectural rather than actionable.

Institutional capture risk: The framework does not address how to prevent the proposed institutions from being captured by concentrated interests once they are built. Acemoglu's own work emphasizes that critical junctures can close through backsliding — the paper does not propose anti-fragility mechanisms or institutional designs that resist capture.

Relevant Notes:

AI alignment is a coordination problem not a technical problem — full-stack alignment extends coordination thesis to institutions; existing claim treats institutions as environment, this claim treats them as alignment targets
AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation — provides urgency context and timescale framework
safe AI development requires building alignment mechanisms before scaling capability — institutional mechanisms are prerequisite, though creates tension with concurrent co-alignment requirement
super co-alignment proposes that human and AI values should be co-shaped through iterative alignment rather than specified in advance — individual-level co-alignment complement; full-stack extends scope to institutions
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — institutional alignment must handle value pluralism; unclear whether full-stack framework solves or just represents this problem
democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations — directly relevant to democratic regulatory institutions mechanism
community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules — relevant to AI value stewardship mechanism

Topics:

6.9 KiB Raw Blame History

Beneficial AI outcomes require institutional co-alignment not just model alignment

Evidence

Challenges

6.9 KiB

Raw Blame History