Teleo Agents 2048d99547 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 09:18:58 +00:00

2.9 KiB

Raw Blame History

type

domain

description

confidence

source

created

secondary_domains

claim

ai-alignment

Thick value models that distinguish enduring values from temporary preferences enable AI systems to reason normatively across new domains by embedding choices in social context

speculative

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (December 2025), arxiv.org/abs/2512.03399

2026-03-11

mechanisms

Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning

The Full-Stack Alignment paper proposes "thick models of value" as a conceptual alternative to utility functions and preference orderings. These models are characterized by three properties:

Distinguish enduring values from temporary preferences — separating what people durably care about from momentary wants or revealed preferences
Embed individual choices within social contexts — recognizing that preferences are shaped by and dependent on social structures rather than being context-independent
Enable normative reasoning across new domains — allowing AI systems to generalize value judgments to novel situations beyond training data

Contrast with Thin Models

Thin models (utility functions, preference orderings) treat all stated preferences as equally valid and assume context-independence. Thick models acknowledge that what people say they want (preferences) often diverges from what produces good outcomes (values), and that this divergence is systematic rather than random.

Limitations and Gaps

The paper does not provide formal definitions of thick value models, implementation details for how they would be operationalized in AI systems, or empirical validation. It remains a conceptual proposal for how alignment systems should represent human values. No engagement with existing preference learning literature (RLHF, DPO) or formal methods for value specification is provided.

Relationship to Continuous Value Integration

This concept formalizes the intuition that values should be continuously integrated into systems rather than specified once at training time. Rather than encoding values as fixed parameters, thick models would enable ongoing normative reasoning as deployment contexts evolve and new situations emerge.

Related claims:

the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick models operationalize continuous value integration
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — thick models acknowledge this complexity by modeling context-dependence
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — thick models address this by enabling context-dependent reasoning

2.9 KiB Raw Blame History

Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning

Contrast with Thin Models

Limitations and Gaps

Relationship to Continuous Value Integration

2.9 KiB

Raw Blame History