- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 6) Pentagon-Agent: Theseus <HEADLESS>
3.3 KiB
| type | domain | description | confidence | source | created | secondary_domains | |
|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains | speculative | Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025) | 2026-03-11 |
|
Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning
The Full-Stack Alignment paper proposes thick models of value as an alternative to utility functions and preference orderings. These models:
- Distinguish enduring values from temporary preferences — separating what people consistently care about from context-dependent wants
- Model individual choices within social contexts — recognizing that preferences are embedded in relationships and institutions
- Enable normative reasoning across new domains — allowing AI systems to generalize value judgments to novel situations
This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.
Theoretical Foundation
The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.
The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.
Limitations and Open Questions
The paper does not provide:
- Formal specification of how thick models are constructed
- Empirical validation that thick models outperform thin models in practice
- Resolution of whose enduring values are privileged when they conflict
- Technical implementation details for deployment
The claim remains speculative until these gaps are addressed through follow-up work or independent validation.
Relationship to Existing Claims
This formalizes the intuition behind the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.
It also addresses the failure mode identified in modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling—both approaches reject the single-reward-function assumption, though through different mechanisms.
Relevant Notes:
- the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick models as implementation mechanism
- modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling — related approach to preference diversity
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — thick models as response to specification intractability
Topics: