theseus: extract from 2025-12-00-fullstack-alignment-thick-models-value.md

- Source: inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 6)

Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-12 05:42:51 +00:00
parent ba4ac4a73e
commit aa8a9b4ca8
4 changed files with 87 additions and 1 deletions

View file

@ -21,6 +21,12 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v
Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system.
### Additional Evidence (extend)
*Source: [[2025-12-00-fullstack-alignment-thick-models-value]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
Full-stack alignment extends the coordination thesis from inter-lab coordination to institutional co-alignment. The framework argues that alignment requires concurrent co-alignment of AI systems AND the institutions that shape them—not just coordination between AI developers. Five implementation mechanisms are proposed: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions. This suggests the coordination problem operates at multiple levels: between AI labs (existing thesis), between AI systems and institutions (new), and between institutional stakeholders (implicit).
---
Relevant Notes:

View file

@ -0,0 +1,39 @@
---
type: claim
domain: ai-alignment
description: "Full-stack alignment requires concurrent co-alignment of AI systems and institutions, not model alignment alone"
confidence: experimental
source: "Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arxiv.org/abs/2512.03399), December 2025"
created: 2026-03-11
secondary_domains: [mechanisms, grand-strategy]
---
# Beneficial AI outcomes require concurrent alignment of systems and institutions, not model alignment alone
The full-stack alignment framework argues that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" in isolation. Alignment must address both AI systems AND the institutions that shape their development and deployment. This extends beyond single-organization objectives to address misalignment across multiple stakeholders.
The paper proposes five implementation mechanisms for institutional co-alignment:
1. AI value stewardship
2. Normatively competent agents
3. Win-win negotiation systems
4. Meaning-preserving economic mechanisms
5. Democratic regulatory institutions
The core argument: even perfectly aligned individual AI systems can produce harmful outcomes through misaligned deployment contexts, competitive dynamics between organizations, or governance failures at the institutional level. Alignment is therefore a system-level coordination problem where institutional structures must co-evolve with AI capabilities.
## Evidence
The paper provides architectural reasoning grounded in the observation that institutional incentives often conflict with individual system alignment. However, the framework lacks empirical validation—no deployment data, no formal verification, and no engagement with existing technical alignment approaches (RLHF, constitutional AI, bridging-based mechanisms). The five mechanisms are proposed as necessary but remain underspecified technically.
This is a conceptually ambitious framework from a recent paper (December 2025) that extends rather than replaces existing alignment work.
---
Relevant Notes:
- [[AI alignment is a coordination problem not a technical problem]] — full-stack alignment extends this thesis from inter-lab coordination to AI-institution co-alignment
- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — directly addresses the institutional governance gap
- [[safe AI development requires building alignment mechanisms before scaling capability]] — institutional alignment is proposed as one such mechanism
- [[superorganism organization extends effective lifespan substantially at each organizational level which means civilizational intelligence operates on temporal horizons that individual-preference alignment cannot serve]] — related argument about system-level vs individual alignment
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -0,0 +1,35 @@
---
type: claim
domain: ai-alignment
description: "Thick value models distinguish stable enduring values from context-dependent preferences, enabling normative reasoning in novel domains"
confidence: experimental
source: "Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (arxiv.org/abs/2512.03399), December 2025"
created: 2026-03-11
secondary_domains: [mechanisms]
---
# Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning across contexts
Thick models of value provide an alternative to utility functions and preference orderings by:
- Distinguishing enduring values (stable commitments) from temporary preferences (context-dependent wants)
- Modeling how individual choices embed within social contexts rather than treating preferences as atomic
- Enabling normative reasoning—determining what *should* happen rather than merely what humans *say they want*—by grounding decisions in stable values
This framework addresses a core limitation of preference-based alignment: preferences are unstable and context-dependent ("I prefer coffee today"), while values represent deeper commitments that persist across situations ("I value autonomy"). A thick model can distinguish these and reason about which should guide AI behavior in novel contexts where humans haven't specified preferences.
## Evidence
The paper introduces thick models conceptually but provides no implementation details, training procedures, or empirical validation. The distinction between values and preferences is philosophically motivated but lacks operationalization—no comparison with existing alignment approaches (RLHF, constitutional AI, DPO) is provided, and no experiments demonstrate that thick models actually improve alignment outcomes.
This is a conceptual contribution from a recent paper that formalizes an intuition about value stability but remains unvalidated technically.
---
Relevant Notes:
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick models formalize continuous value integration by distinguishing stable values from momentary preferences
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — thick models propose addressing this by modeling value stability explicitly
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models acknowledge this complexity by refusing to specify values completely in advance
Topics:
- [[domains/ai-alignment/_map]]
- [[core/mechanisms/_map]]

View file

@ -7,9 +7,15 @@ date: 2025-12-01
domain: ai-alignment
secondary_domains: [mechanisms, grand-strategy]
format: paper
status: unprocessed
status: processed
priority: medium
tags: [full-stack-alignment, institutional-alignment, thick-values, normative-competence, co-alignment]
processed_by: theseus
processed_date: 2026-03-11
claims_extracted: ["beneficial-ai-outcomes-require-concurrent-alignment-of-systems-and-institutions-not-model-alignment-alone.md", "thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning-across-contexts.md"]
enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted two experimental claims from architecturally ambitious but technically underspecified paper. Full-stack alignment extends coordination thesis to institutions. Thick models of value formalize continuous value integration. No empirical validation or implementation details provided. Paper lacks engagement with existing alignment approaches (RLHF, constitutional AI, etc.). Both claims rated experimental due to single-source conceptual framework without deployment data."
---
## Content