theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-12 11:24:54 +00:00
parent ba4ac4a73e
commit 13a6fe956f
4 changed files with 104 additions and 1 deletions

View file

@ -21,6 +21,12 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v
Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system.
### Additional Evidence (extend)
*Source: [[2025-12-00-fullstack-alignment-thick-models-value]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
The Full-Stack Alignment paper (December 2025) extends the coordination thesis to institutional level. It argues that 'beneficial societal outcomes cannot be guaranteed by aligning individual AI systems' alone, proposing full-stack alignment as the concurrent alignment of AI systems AND the institutions that shape them. This moves beyond inter-lab coordination to institutional co-alignment. The paper identifies five implementation mechanisms: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions. This suggests coordination must occur not just between AI developers but between AI systems and the governance structures that regulate them.
---
Relevant Notes:

View file

@ -0,0 +1,39 @@
---
type: claim
domain: ai-alignment
description: "Beneficial AI outcomes require aligning both AI systems and the institutions that shape them simultaneously rather than focusing on individual model alignment alone"
confidence: experimental
source: "Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)"
created: 2026-03-11
secondary_domains: [mechanisms, grand-strategy]
---
# AI alignment requires institutional co-alignment not just model alignment
The Full-Stack Alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone. Instead, alignment must be comprehensive—addressing both AI systems and the institutions that shape their development and deployment.
This extends beyond single-organization objectives to address misalignment across multiple stakeholders. The paper proposes **full-stack alignment** as the concurrent alignment of AI systems and institutions with what people value, moving the alignment problem from a purely technical domain into institutional design.
## Evidence
The paper identifies five implementation mechanisms for full-stack alignment:
1. AI value stewardship
2. Normatively competent agents
3. Win-win negotiation systems
4. Meaning-preserving economic mechanisms
5. Democratic regulatory institutions
This multi-layered approach suggests that technical alignment solutions (RLHF, constitutional AI, etc.) are necessary but insufficient without corresponding institutional structures.
## Relationship to Existing Claims
This claim extends [[AI alignment is a coordination problem not a technical problem]] by arguing that coordination must occur not just between AI labs but between AI systems and the institutions governing them. Where the coordination thesis focuses on inter-organizational dynamics, full-stack alignment adds institutional architecture as a co-equal alignment target.
The claim also connects to [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] by framing this institutional co-alignment as urgent—the window for shaping both AI and institutions simultaneously may be narrow.
---
Relevant Notes:
- [[AI alignment is a coordination problem not a technical problem]] — extended by institutional dimension
- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — timing urgency
- [[safe AI development requires building alignment mechanisms before scaling capability]] — complementary but focused on technical sequencing

View file

@ -0,0 +1,52 @@
---
type: claim
domain: ai-alignment
description: "Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains"
confidence: speculative
source: "Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)"
created: 2026-03-11
secondary_domains: [mechanisms]
---
# Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning
The Full-Stack Alignment paper proposes **thick models of value** as an alternative to utility functions and preference orderings. These models:
1. **Distinguish enduring values from temporary preferences** — separating what people consistently care about from context-dependent wants
2. **Model individual choices within social contexts** — recognizing that preferences are embedded in relationships and institutions
3. **Enable normative reasoning across new domains** — allowing AI systems to generalize value judgments to novel situations
This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.
## Theoretical Foundation
The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.
The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.
## Limitations and Open Questions
The paper does not provide:
- Formal specification of how thick models are constructed
- Empirical validation that thick models outperform thin models in practice
- Resolution of whose enduring values are privileged when they conflict
- Technical implementation details for deployment
The claim remains speculative until these gaps are addressed through follow-up work or independent validation.
## Relationship to Existing Claims
This formalizes the intuition behind [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.
It also addresses the failure mode identified in [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]]—both approaches reject the single-reward-function assumption, though through different mechanisms.
---
Relevant Notes:
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick models as implementation mechanism
- [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]] — related approach to preference diversity
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models as response to specification intractability
Topics:
- [[domains/ai-alignment/_map]]
- [[core/mechanisms/_map]]

View file

@ -7,9 +7,15 @@ date: 2025-12-01
domain: ai-alignment
secondary_domains: [mechanisms, grand-strategy]
format: paper
status: unprocessed
status: processed
priority: medium
tags: [full-stack-alignment, institutional-alignment, thick-values, normative-competence, co-alignment]
processed_by: theseus
processed_date: 2026-03-11
claims_extracted: ["ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md", "thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md"]
enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted two novel claims from Full-Stack Alignment paper: (1) institutional co-alignment as necessary complement to model alignment, (2) thick models of value as formalization of continuous value integration. Both claims rated experimental/speculative due to limited technical specificity and lack of empirical validation. Paper extends existing coordination thesis to institutional level. No entity data present—purely conceptual/theoretical paper."
---
## Content