teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md

---
type: claim
domain: ai-alignment
description: "Thick value models distinguish enduring values from temporary preferences and embed individual choices in social contexts, enabling normative reasoning that utility functions cannot capture"
confidence: experimental
source: "Full-Stack Alignment paper (December 2025), arxiv.org/abs/2512.03399"
created: 2026-03-11
secondary_domains: [mechanisms]
---

# Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning across new domains

The Full-Stack Alignment paper proposes "thick models of value" as an alternative to utility functions and preference orderings. These models address a fundamental problem in AI alignment: the specification trap.

**What thick models do:**

1. **Distinguish enduring values from temporary preferences** — Separates what people say they want (preferences, often context-dependent and volatile) from what actually produces good outcomes (values, more stable and generalizable)
2. **Model individual choices within social contexts** — Recognizes that choices are not isolated but embedded in social structures, relationships, and institutional contexts
3. **Enable normative reasoning across new domains** — Allow systems to reason about values in contexts not explicitly covered by training data, rather than failing when encountering novel situations

**Why this matters for alignment:**

This contrasts with "thin" models (utility functions, preference orderings) that treat all preferences as equivalent and context-independent. Thin models fail because:
- They cannot distinguish signal (enduring values) from noise (temporary preferences)
- They assume preferences are stable across contexts when they are actually highly context-dependent
- They cannot generalize to novel domains because they have no principled way to reason about values beyond training data

Thick models formalize why specification-in-advance fails: human values have structure, hierarchy, and context-dependence that simple preference aggregation cannot capture.

## Evidence
- Full-Stack Alignment paper (December 2025) — introduces thick vs thin value models as a core component of the alignment framework
- The distinction between preferences (what people say they want) and values (what produces good outcomes) directly addresses the specification trap identified in existing alignment research
- The paper argues that thick models enable "normative reasoning across new domains" — a capability thin models lack

## Limitations and Open Questions
- No formal specification of what constitutes a "thick model" or how to implement one in practice
- Unclear how to operationalize the distinction between enduring values and temporary preferences in real systems
- Risk of paternalism: who decides which preferences are "temporary" vs which values are "enduring"? This could embed designer bias
- No empirical validation that thick models actually outperform thin models on alignment tasks
- The paper does not address how thick models handle genuinely conflicting values across populations

---

Relevant Notes:
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick values formalize continuous value integration
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models acknowledge this complexity
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] — thin models fail at diversity
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] — relevant to the paternalism concern