Teleo Agents aa8a9b4ca8 theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 05:42:51 +00:00

2.7 KiB

Raw Blame History

type

domain

description

confidence

source

created

secondary_domains

claim

ai-alignment

Thick value models distinguish stable enduring values from context-dependent preferences, enabling normative reasoning in novel domains

experimental

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arxiv.org/abs/2512.03399), December 2025

2026-03-11

mechanisms

Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning across contexts

Thick models of value provide an alternative to utility functions and preference orderings by:

Distinguishing enduring values (stable commitments) from temporary preferences (context-dependent wants)
Modeling how individual choices embed within social contexts rather than treating preferences as atomic
Enabling normative reasoning—determining what should happen rather than merely what humans say they want—by grounding decisions in stable values

This framework addresses a core limitation of preference-based alignment: preferences are unstable and context-dependent ("I prefer coffee today"), while values represent deeper commitments that persist across situations ("I value autonomy"). A thick model can distinguish these and reason about which should guide AI behavior in novel contexts where humans haven't specified preferences.

Evidence

The paper introduces thick models conceptually but provides no implementation details, training procedures, or empirical validation. The distinction between values and preferences is philosophically motivated but lacks operationalization—no comparison with existing alignment approaches (RLHF, constitutional AI, DPO) is provided, and no experiments demonstrate that thick models actually improve alignment outcomes.

This is a conceptual contribution from a recent paper that formalizes an intuition about value stability but remains unvalidated technically.

Relevant Notes:

the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick models formalize continuous value integration by distinguishing stable values from momentary preferences
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — thick models propose addressing this by modeling value stability explicitly
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — thick models acknowledge this complexity by refusing to specify values completely in advance

Topics:

2.7 KiB Raw Blame History

Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning across contexts

Evidence

2.7 KiB

Raw Blame History