teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md
Teleo Agents 16d4102f55 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 08:16:04 +00:00

3.7 KiB

type domain description confidence source created secondary_domains
claim ai-alignment Thick value models that distinguish enduring values from temporary preferences enable AI systems to reason normatively across new domains by embedding choices in social context speculative Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025) 2026-03-11
mechanisms

Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning

The Full-Stack Alignment paper proposes "thick models of value" as an alternative to utility functions and preference orderings. Thick value models are designed to:

  1. Distinguish enduring values from temporary preferences — What people consistently care about across time and contexts vs. what they want in a specific moment
  2. Model how individual choices embed within social contexts — Decisions are not isolated preference expressions but socially situated actions that derive meaning from institutional and cultural context
  3. Enable normative reasoning across new domains — The model can generalize to novel situations by understanding underlying values rather than memorizing preference rankings from training data

This contrasts with thin models (utility functions, preference orderings) that treat all stated preferences as equally valid expressions of value and ignore social context. The distinction maps to the gap between what people say they want (surface preferences) and what actually produces good outcomes for them (deeper values).

Evidence

The paper provides conceptual architecture but no implementation or empirical validation. The claim is theoretical—thick value models are proposed as a design target for alignment systems, not demonstrated as achievable or effective in practice.

The paper does not engage with existing preference learning methods (RLHF, DPO, IRL) or explain how thick models would be learned from behavioral data. It does not provide formal definitions or computational procedures for distinguishing enduring values from temporary preferences.

Challenges and Open Questions

  1. Empirical fuzziness: The distinction between "enduring values" and "temporary preferences" may be empirically fuzzy in practice. What appears to be a temporary preference might reflect a genuine value in a specific context, or vice versa.

  2. Learning problem: No mechanism is proposed for how an AI system would learn thick value models from data. Standard preference learning assumes all revealed preferences are valid; thick models require a way to filter or weight preferences by endurance and context-appropriateness.

  3. Social context specification: The paper does not specify how to formally represent or extract "social context" from data or how to verify that an AI system has correctly modeled it.

  4. Comparison to existing work: No engagement with related approaches like value learning, inverse reinforcement learning, or constitutional AI that also attempt to move beyond simple preference orderings.


Relevant Notes:

Topics: