teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md

---
type: claim
domain: ai-alignment
description: "Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains"
confidence: speculative
source: "Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)"
created: 2026-03-11
secondary_domains: [mechanisms]
---

# Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning

The Full-Stack Alignment paper proposes **thick models of value** as an alternative to utility functions and preference orderings. These models:

1. **Distinguish enduring values from temporary preferences** — separating what people consistently care about from context-dependent wants
2. **Model individual choices within social contexts** — recognizing that preferences are embedded in relationships and institutions
3. **Enable normative reasoning across new domains** — allowing AI systems to generalize value judgments to novel situations

This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.

## Theoretical Foundation

The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.

The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.

## Limitations and Open Questions

The paper does not provide:
- Formal specification of how thick models are constructed
- Empirical validation that thick models outperform thin models in practice
- Resolution of whose enduring values are privileged when they conflict
- Technical implementation details for deployment

The claim remains speculative until these gaps are addressed through follow-up work or independent validation.

## Relationship to Existing Claims

This formalizes the intuition behind [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.

It also addresses the failure mode identified in [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]]—both approaches reject the single-reward-function assumption, though through different mechanisms.

---

Relevant Notes:
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick models as implementation mechanism
- [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]] — related approach to preference diversity
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models as response to specification intractability

Topics:
- [[domains/ai-alignment/_map]]
- [[core/mechanisms/_map]]