--- type: claim domain: ai-alignment description: "Thick value models distinguish enduring values from temporary preferences and embed individual choices in social contexts, enabling normative reasoning that utility functions cannot capture" confidence: experimental source: "Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399" created: 2026-03-11 secondary_domains: [mechanisms] --- # Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning across new domains The Full-Stack Alignment paper proposes "thick models of value" as an alternative to utility functions and preference orderings. These models address a fundamental problem in AI alignment: the specification trap. **What thick models do:** 1. **Distinguish enduring values from temporary preferences** — Separates what people say they want (preferences, often context-dependent and volatile) from what actually produces good outcomes (values, more stable and generalizable) 2. **Model individual choices within social contexts** — Recognizes that choices are not isolated but embedded in social structures, relationships, and institutional contexts 3. **Enable normative reasoning across new domains** — Allow systems to reason about values in contexts not explicitly covered by training data, rather than failing when encountering novel situations **Why this matters for alignment:** This contrasts with "thin" models (utility functions, preference orderings) that treat all preferences as equivalent and context-independent. Thin models fail because: - They cannot distinguish signal (enduring values) from noise (temporary preferences) - They assume preferences are stable across contexts when they are actually highly context-dependent - They cannot generalize to novel domains because they have no principled way to reason about values beyond training data Thick models formalize why specification-in-advance fails: human values have structure, hierarchy, and context-dependence that simple preference aggregation cannot capture. ## Evidence - Full-Stack Alignment paper (December 2025) — introduces thick vs thin value models as a core component of the alignment framework - The distinction between preferences (what people say they want) and values (what produces good outcomes) directly addresses the specification trap identified in existing alignment research - The paper argues that thick models enable "normative reasoning across new domains" — a capability thin models lack ## Limitations and Open Questions - No formal specification of what constitutes a "thick model" or how to implement one in practice - Unclear how to operationalize the distinction between enduring values and temporary preferences in real systems - Risk of paternalism: who decides which preferences are "temporary" vs which values are "enduring"? This could embed designer bias - No empirical validation that thick models actually outperform thin models on alignment tasks - The paper does not address how thick models handle genuinely conflicting values across populations --- Relevant Notes: - [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick values formalize continuous value integration - [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models acknowledge this complexity - [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] — thin models fail at diversity - [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] — relevant to the paternalism concern