- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 6) Pentagon-Agent: Theseus <HEADLESS>
52 lines
3.3 KiB
Markdown
52 lines
3.3 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
description: "Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains"
|
|
confidence: speculative
|
|
source: "Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)"
|
|
created: 2026-03-11
|
|
secondary_domains: [mechanisms]
|
|
---
|
|
|
|
# Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning
|
|
|
|
The Full-Stack Alignment paper proposes **thick models of value** as an alternative to utility functions and preference orderings. These models:
|
|
|
|
1. **Distinguish enduring values from temporary preferences** — separating what people consistently care about from context-dependent wants
|
|
2. **Model individual choices within social contexts** — recognizing that preferences are embedded in relationships and institutions
|
|
3. **Enable normative reasoning across new domains** — allowing AI systems to generalize value judgments to novel situations
|
|
|
|
This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.
|
|
|
|
## Theoretical Foundation
|
|
|
|
The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.
|
|
|
|
The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.
|
|
|
|
## Limitations and Open Questions
|
|
|
|
The paper does not provide:
|
|
- Formal specification of how thick models are constructed
|
|
- Empirical validation that thick models outperform thin models in practice
|
|
- Resolution of whose enduring values are privileged when they conflict
|
|
- Technical implementation details for deployment
|
|
|
|
The claim remains speculative until these gaps are addressed through follow-up work or independent validation.
|
|
|
|
## Relationship to Existing Claims
|
|
|
|
This formalizes the intuition behind [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.
|
|
|
|
It also addresses the failure mode identified in [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]]—both approaches reject the single-reward-function assumption, though through different mechanisms.
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick models as implementation mechanism
|
|
- [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]] — related approach to preference diversity
|
|
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models as response to specification intractability
|
|
|
|
Topics:
|
|
- [[domains/ai-alignment/_map]]
|
|
- [[core/mechanisms/_map]]
|