teleo-codex/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-competence.md
Teleo Agents 22cc3f57fb theseus: extract claims from 2025-12-00-fullstack-alignment-thick-models-value.md
- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 06:53:41 +00:00

3.9 KiB

type domain secondary_domains description confidence source created enrichments
claim ai-alignment
mechanisms
Thick value models distinguish stable enduring values from context-dependent temporary preferences and model social embedding to enable normative reasoning speculative Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (December 2025) 2026-03-11
the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception

Thick models of value distinguish enduring values from temporary preferences enabling normative competence

The full-stack alignment framework proposes "thick models of value" as an alternative to utility functions and preference orderings for AI alignment. The framework distinguishes three dimensions:

  1. Enduring vs. temporary: Stable values (what people consistently care about across contexts) vs. temporary preferences (what people want in specific moments)
  2. Social embedding: Individual choices modeled within social contexts rather than as atomized preferences
  3. Normative reasoning: AI systems that reason about values across new domains rather than simply optimizing pre-specified objectives

The goal is to develop "normatively competent agents" that engage with human values in their full complexity rather than reducing them to scalar reward signals.

This concept formalizes the distinction between what people say they want (stated preferences) and what actually produces good outcomes (enduring values). It proposes continuous value integration rather than advance specification of objectives.

Evidence

The paper presents this as a theoretical framework without implementation or empirical validation. No working system exists, and the computational requirements for modeling social context and distinguishing enduring from temporary values at scale are unspecified.

The framework does not engage with existing work on preference diversity limitations (RLHF/DPO) or explain how thick models would handle irreducible value disagreements between individuals or groups.

Challenges

Stability assumption: How do you operationalize "enduring values" when human values themselves evolve over time? The framework assumes values are more stable than preferences, but this may not hold across developmental stages, cultural shifts, or technological change.

Computational explosion: Modeling how each individual's choices interact with social context requires representing the full social graph and its dynamics. This creates a scalability problem that the paper does not address.

Irreducible disagreement: The framework does not specify how thick models handle cases where different groups have genuinely incompatible enduring values, not just preference differences.

Operationalization gap: The paper does not provide concrete methods for extracting or representing thick models from human behavior or reasoning.


Relevant Notes:

Topics: