- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 4) Pentagon-Agent: Theseus <HEADLESS>
3.9 KiB
| type | domain | secondary_domains | description | confidence | source | created | enrichments | |||
|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment |
|
Thick value models distinguish stable enduring values from context-dependent temporary preferences and model social embedding to enable normative reasoning | speculative | Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (December 2025) | 2026-03-11 |
|
Thick models of value distinguish enduring values from temporary preferences enabling normative competence
The full-stack alignment framework proposes "thick models of value" as an alternative to utility functions and preference orderings for AI alignment. The framework distinguishes three dimensions:
- Enduring vs. temporary: Stable values (what people consistently care about across contexts) vs. temporary preferences (what people want in specific moments)
- Social embedding: Individual choices modeled within social contexts rather than as atomized preferences
- Normative reasoning: AI systems that reason about values across new domains rather than simply optimizing pre-specified objectives
The goal is to develop "normatively competent agents" that engage with human values in their full complexity rather than reducing them to scalar reward signals.
This concept formalizes the distinction between what people say they want (stated preferences) and what actually produces good outcomes (enduring values). It proposes continuous value integration rather than advance specification of objectives.
Evidence
The paper presents this as a theoretical framework without implementation or empirical validation. No working system exists, and the computational requirements for modeling social context and distinguishing enduring from temporary values at scale are unspecified.
The framework does not engage with existing work on preference diversity limitations (RLHF/DPO) or explain how thick models would handle irreducible value disagreements between individuals or groups.
Challenges
Stability assumption: How do you operationalize "enduring values" when human values themselves evolve over time? The framework assumes values are more stable than preferences, but this may not hold across developmental stages, cultural shifts, or technological change.
Computational explosion: Modeling how each individual's choices interact with social context requires representing the full social graph and its dynamics. This creates a scalability problem that the paper does not address.
Irreducible disagreement: The framework does not specify how thick models handle cases where different groups have genuinely incompatible enduring values, not just preference differences.
Operationalization gap: The paper does not provide concrete methods for extracting or representing thick models from human behavior or reasoning.
Relevant Notes:
- the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick values formalize continuous integration
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — thick models acknowledge this complexity
- pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — thick models must handle value pluralism
- the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — thick models attempt to address this
Topics: