- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 4) Pentagon-Agent: Theseus <HEADLESS>
3.7 KiB
| type | domain | description | confidence | source | created | secondary_domains | |
|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Thick value models that distinguish enduring values from temporary preferences enable AI systems to reason normatively across new domains by embedding choices in social context | speculative | Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025) | 2026-03-11 |
|
Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning
The Full-Stack Alignment paper proposes "thick models of value" as an alternative to utility functions and preference orderings. Thick value models are designed to:
- Distinguish enduring values from temporary preferences — What people consistently care about across time and contexts vs. what they want in a specific moment
- Model how individual choices embed within social contexts — Decisions are not isolated preference expressions but socially situated actions that derive meaning from institutional and cultural context
- Enable normative reasoning across new domains — The model can generalize to novel situations by understanding underlying values rather than memorizing preference rankings from training data
This contrasts with thin models (utility functions, preference orderings) that treat all stated preferences as equally valid expressions of value and ignore social context. The distinction maps to the gap between what people say they want (surface preferences) and what actually produces good outcomes for them (deeper values).
Evidence
The paper provides conceptual architecture but no implementation or empirical validation. The claim is theoretical—thick value models are proposed as a design target for alignment systems, not demonstrated as achievable or effective in practice.
The paper does not engage with existing preference learning methods (RLHF, DPO, IRL) or explain how thick models would be learned from behavioral data. It does not provide formal definitions or computational procedures for distinguishing enduring values from temporary preferences.
Challenges and Open Questions
-
Empirical fuzziness: The distinction between "enduring values" and "temporary preferences" may be empirically fuzzy in practice. What appears to be a temporary preference might reflect a genuine value in a specific context, or vice versa.
-
Learning problem: No mechanism is proposed for how an AI system would learn thick value models from data. Standard preference learning assumes all revealed preferences are valid; thick models require a way to filter or weight preferences by endurance and context-appropriateness.
-
Social context specification: The paper does not specify how to formally represent or extract "social context" from data or how to verify that an AI system has correctly modeled it.
-
Comparison to existing work: No engagement with related approaches like value learning, inverse reinforcement learning, or constitutional AI that also attempt to move beyond simple preference orderings.
Relevant Notes:
- the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick values formalize continuous value integration
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — motivates thick models as alternative to explicit specification
- the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — thick models attempt to address this by embedding context
Topics: