teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md
Teleo Agents 13a6fe956f theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 11:24:54 +00:00

52 lines
3.3 KiB
Markdown

---
type: claim
domain: ai-alignment
description: "Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains"
confidence: speculative
source: "Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)"
created: 2026-03-11
secondary_domains: [mechanisms]
---
# Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning
The Full-Stack Alignment paper proposes **thick models of value** as an alternative to utility functions and preference orderings. These models:
1. **Distinguish enduring values from temporary preferences** — separating what people consistently care about from context-dependent wants
2. **Model individual choices within social contexts** — recognizing that preferences are embedded in relationships and institutions
3. **Enable normative reasoning across new domains** — allowing AI systems to generalize value judgments to novel situations
This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.
## Theoretical Foundation
The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.
The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.
## Limitations and Open Questions
The paper does not provide:
- Formal specification of how thick models are constructed
- Empirical validation that thick models outperform thin models in practice
- Resolution of whose enduring values are privileged when they conflict
- Technical implementation details for deployment
The claim remains speculative until these gaps are addressed through follow-up work or independent validation.
## Relationship to Existing Claims
This formalizes the intuition behind [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.
It also addresses the failure mode identified in [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]]—both approaches reject the single-reward-function assumption, though through different mechanisms.
---
Relevant Notes:
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick models as implementation mechanism
- [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]] — related approach to preference diversity
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models as response to specification intractability
Topics:
- [[domains/ai-alignment/_map]]
- [[core/mechanisms/_map]]