Teleo Agents 13a6fe956f theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 11:24:54 +00:00

3.3 KiB

Raw Blame History

type

domain

description

confidence

source

created

secondary_domains

claim

ai-alignment

Thick value models that distinguish enduring values from temporary preferences and embed individual choices in social contexts enable AI systems to reason normatively across new domains

speculative

Full-Stack Alignment paper (arxiv.org/abs/2512.03399, December 2025)

2026-03-11

mechanisms

Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning

The Full-Stack Alignment paper proposes thick models of value as an alternative to utility functions and preference orderings. These models:

Distinguish enduring values from temporary preferences — separating what people consistently care about from context-dependent wants
Model individual choices within social contexts — recognizing that preferences are embedded in relationships and institutions
Enable normative reasoning across new domains — allowing AI systems to generalize value judgments to novel situations

This contrasts with thin models (utility maximization, revealed preferences) that treat all stated preferences as equally valid and context-independent.

Theoretical Foundation

The distinction between "what people say they want" (preferences) and "what actually produces good outcomes" (values) maps to the difference between satisfying immediate desires and serving long-term flourishing. Thick models attempt to capture this distinction formally.

The paper argues this enables "normatively competent agents" that can reason about values rather than merely optimize for stated preferences.

Limitations and Open Questions

The paper does not provide:

Formal specification of how thick models are constructed
Empirical validation that thick models outperform thin models in practice
Resolution of whose enduring values are privileged when they conflict
Technical implementation details for deployment

The claim remains speculative until these gaps are addressed through follow-up work or independent validation.

Relationship to Existing Claims

This formalizes the intuition behind the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance—thick models provide a mechanism for continuous value integration by modeling values as context-dependent and evolving rather than fixed.

It also addresses the failure mode identified in modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling—both approaches reject the single-reward-function assumption, though through different mechanisms.

Relevant Notes:

the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance — thick models as implementation mechanism
modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling — related approach to preference diversity
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — thick models as response to specification intractability

Topics:

3.3 KiB Raw Blame History

Thick models of value distinguish enduring values from temporary preferences enabling normative reasoning

Theoretical Foundation

Limitations and Open Questions

Relationship to Existing Claims

3.3 KiB

Raw Blame History