teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning-across-contexts.md
Teleo Agents aa8a9b4ca8 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 05:42:51 +00:00

2.7 KiB

type domain description confidence source created secondary_domains
claim ai-alignment Thick value models distinguish stable enduring values from context-dependent preferences, enabling normative reasoning in novel domains experimental Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arxiv.org/abs/2512.03399), December 2025 2026-03-11
mechanisms

Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning across contexts

Thick models of value provide an alternative to utility functions and preference orderings by:

  • Distinguishing enduring values (stable commitments) from temporary preferences (context-dependent wants)
  • Modeling how individual choices embed within social contexts rather than treating preferences as atomic
  • Enabling normative reasoning—determining what should happen rather than merely what humans say they want—by grounding decisions in stable values

This framework addresses a core limitation of preference-based alignment: preferences are unstable and context-dependent ("I prefer coffee today"), while values represent deeper commitments that persist across situations ("I value autonomy"). A thick model can distinguish these and reason about which should guide AI behavior in novel contexts where humans haven't specified preferences.

Evidence

The paper introduces thick models conceptually but provides no implementation details, training procedures, or empirical validation. The distinction between values and preferences is philosophically motivated but lacks operationalization—no comparison with existing alignment approaches (RLHF, constitutional AI, DPO) is provided, and no experiments demonstrate that thick models actually improve alignment outcomes.

This is a conceptual contribution from a recent paper that formalizes an intuition about value stability but remains unvalidated technically.


Relevant Notes:

Topics: