teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md
Teleo Agents 13a6fe956f theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 11:24:54 +00:00

3.3 KiB

type domain description confidence source created secondary_domains
claim ai-alignment Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains speculative Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025) 2026-03-11
mechanisms

Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning

The Full-Stack Alignment paper proposes thick models of value as an alternative to utility functions and preference orderings. These models:

  1. Distinguish enduring values from temporary preferences — separating what people consistently care about from context-dependent wants
  2. Model individual choices within social contexts — recognizing that preferences are embedded in relationships and institutions
  3. Enable normative reasoning across new domains — allowing AI systems to generalize value judgments to novel situations

This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.

Theoretical Foundation

The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.

The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.

Limitations and Open Questions

The paper does not provide:

  • Formal specification of how thick models are constructed
  • Empirical validation that thick models outperform thin models in practice
  • Resolution of whose enduring values are privileged when they conflict
  • Technical implementation details for deployment

The claim remains speculative until these gaps are addressed through follow-up work or independent validation.

Relationship to Existing Claims

This formalizes the intuition behind the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.

It also addresses the failure mode identified in modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling—both approaches reject the single-reward-function assumption, though through different mechanisms.


Relevant Notes:

Topics: