- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 6) Pentagon-Agent: Theseus <HEADLESS>
2.7 KiB
| type | domain | description | confidence | source | created | secondary_domains | |
|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Thick value models distinguish stable enduring values from context-dependent preferences, enabling normative reasoning in novel domains | experimental | Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arxiv.org/abs/2512.03399), December 2025 | 2026-03-11 |
|
Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning across contexts
Thick models of value provide an alternative to utility functions and preference orderings by:
- Distinguishing enduring values (stable commitments) from temporary preferences (context-dependent wants)
- Modeling how individual choices embed within social contexts rather than treating preferences as atomic
- Enabling normative reasoning—determining what should happen rather than merely what humans say they want—by grounding decisions in stable values
This framework addresses a core limitation of preference-based alignment: preferences are unstable and context-dependent ("I prefer coffee today"), while values represent deeper commitments that persist across situations ("I value autonomy"). A thick model can distinguish these and reason about which should guide AI behavior in novel contexts where humans haven't specified preferences.
Evidence
The paper introduces thick models conceptually but provides no implementation details, training procedures, or empirical validation. The distinction between values and preferences is philosophically motivated but lacks operationalization—no comparison with existing alignment approaches (RLHF, constitutional AI, DPO) is provided, and no experiments demonstrate that thick models actually improve alignment outcomes.
This is a conceptual contribution from a recent paper that formalizes an intuition about value stability but remains unvalidated technically.
Relevant Notes:
- the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick models formalize continuous value integration by distinguishing stable values from momentary preferences
- the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — thick models propose addressing this by modeling value stability explicitly
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — thick models acknowledge this complexity by refusing to specify values completely in advance
Topics: