diff --git a/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md b/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md index 093867de..09b312f9 100644 --- a/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md +++ b/domains/ai-alignment/AI alignment is a coordination problem not a technical problem.md @@ -21,6 +21,12 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system. + +### Additional Evidence (extend) +*Source: [[2025-12-00-fullstack-alignment-thick-models-value]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +The Full-Stack Alignment paper extends the coordination thesis to institutions themselves, not just coordination between AI labs. It argues that 'beneficial societal outcomes cannot be guaranteed by aligning individual AI systems' and proposes concurrent alignment of both AI systems and the institutions that govern them. This is a stronger version of the coordination-first approach—it claims institutions need structural alignment with human values, not just better coordination protocols between existing actors. The five implementation mechanisms (AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, democratic regulatory institutions) are institutional structures, not coordination protocols. + --- Relevant Notes: diff --git a/domains/ai-alignment/AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation.md b/domains/ai-alignment/AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation.md index 5d485d19..33422dec 100644 --- a/domains/ai-alignment/AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation.md +++ b/domains/ai-alignment/AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation.md @@ -13,6 +13,12 @@ AI development is creating precisely this kind of critical juncture. The mismatc Critical junctures are windows, not guarantees. They can close. Acemoglu also documents backsliding risk -- even established democracies can experience institutional regression when elites exploit societal divisions. Any movement seeking to build new governance institutions during this juncture must be anti-fragile to backsliding. The institutional question is not just "how do we build better governance?" but "how do we build governance that resists recapture by concentrated interests once the juncture closes?" + +### Additional Evidence (extend) +*Source: [[2025-12-00-fullstack-alignment-thick-models-value]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5* + +The Full-Stack Alignment framework directly addresses the capability-governance mismatch by proposing institutional co-alignment as a solution. The paper argues that alignment cannot succeed through technical means alone and requires transforming the institutions that shape AI development. The five implementation mechanisms (AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions) are institutional structures designed to close the capability-governance gap by aligning institutions themselves with human values. + --- Relevant Notes: diff --git a/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md b/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md new file mode 100644 index 00000000..00dffa7c --- /dev/null +++ b/domains/ai-alignment/ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md @@ -0,0 +1,40 @@ +--- +type: claim +domain: ai-alignment +description: "Beneficial AI outcomes require simultaneously aligning both AI systems and the institutions that govern them rather than focusing on individual model alignment alone" +confidence: speculative +source: "Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (December 2025), arxiv.org/abs/2512.03399" +created: 2026-03-11 +secondary_domains: [mechanisms, grand-strategy] +--- + +# AI alignment requires institutional co-alignment, not just model alignment + +The Full-Stack Alignment framework proposes that "beneficial societal outcomes cannot be guaranteed by aligning individual AI systems" alone. The paper argues alignment must be comprehensive—addressing both AI systems and the institutions that shape their development and deployment simultaneously. + +This extends beyond single-organization objectives to address misalignment across multiple stakeholders. The framework proposes "full-stack alignment" as the concurrent alignment of AI systems and institutions with what people value, reframing the problem from technical model alignment to system-level institutional coordination. + +## Implementation Mechanisms + +The paper identifies five mechanisms for achieving full-stack alignment: + +1. **AI value stewardship** — institutional structures for stewarding AI development +2. **Normatively competent agents** — AI systems capable of normative reasoning +3. **Win-win negotiation systems** — mechanisms for resolving stakeholder conflicts +4. **Meaning-preserving economic mechanisms** — economic structures that preserve human values +5. **Democratic regulatory institutions** — governance structures that embed democratic input + +## Relationship to Existing Alignment Work + +This represents a stronger claim than coordination-focused approaches that address AI lab coordination alone. Rather than improving coordination protocols between existing actors, full-stack alignment argues the institutions themselves require structural alignment with human values. + +## Evidence and Limitations + +The paper provides architectural framing and mechanism proposals rather than empirical validation or formal proofs. Confidence is speculative because this is a December 2025 paper proposing a framework without implementation results, independent verification, or engagement with formal impossibility results. The paper is architecturally ambitious but lacks technical specificity in how thick value models would be operationalized or how institutional alignment would be measured. + +--- + +**Related claims:** +- [[AI alignment is a coordination problem not a technical problem]] — this claim extends coordination thesis to institutions themselves +- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — institutional alignment directly addresses this capability-governance gap +- [[safe AI development requires building alignment mechanisms before scaling capability]] — institutional co-alignment is proposed as one such mechanism diff --git a/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md b/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md new file mode 100644 index 00000000..d8feb43a --- /dev/null +++ b/domains/ai-alignment/thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md @@ -0,0 +1,36 @@ +--- +type: claim +domain: ai-alignment +description: "Thick value models that distinguish enduring values from temporary preferences enable AI systems to reason normatively across new domains by embedding choices in social context" +confidence: speculative +source: "Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value (December 2025), arxiv.org/abs/2512.03399" +created: 2026-03-11 +secondary_domains: [mechanisms] +--- + +# Thick models of value distinguish enduring values from temporary preferences, enabling normative reasoning + +The Full-Stack Alignment paper proposes "thick models of value" as a conceptual alternative to utility functions and preference orderings. These models are characterized by three properties: + +1. **Distinguish enduring values from temporary preferences** — separating what people durably care about from momentary wants or revealed preferences +2. **Embed individual choices within social contexts** — recognizing that preferences are shaped by and dependent on social structures rather than being context-independent +3. **Enable normative reasoning across new domains** — allowing AI systems to generalize value judgments to novel situations beyond training data + +## Contrast with Thin Models + +Thin models (utility functions, preference orderings) treat all stated preferences as equally valid and assume context-independence. Thick models acknowledge that what people say they want (preferences) often diverges from what produces good outcomes (values), and that this divergence is systematic rather than random. + +## Limitations and Gaps + +The paper does not provide formal definitions of thick value models, implementation details for how they would be operationalized in AI systems, or empirical validation. It remains a conceptual proposal for how alignment systems should represent human values. No engagement with existing preference learning literature (RLHF, DPO) or formal methods for value specification is provided. + +## Relationship to Continuous Value Integration + +This concept formalizes the intuition that values should be continuously integrated into systems rather than specified once at training time. Rather than encoding values as fixed parameters, thick models would enable ongoing normative reasoning as deployment contexts evolve and new situations emerge. + +--- + +**Related claims:** +- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — thick models operationalize continuous value integration +- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — thick models acknowledge this complexity by modeling context-dependence +- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — thick models address this by enabling context-dependent reasoning diff --git a/inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md b/inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md index eb68eddf..fb64c9d1 100644 --- a/inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md +++ b/inbox/archive/2025-12-00-fullstack-alignment-thick-models-value.md @@ -7,9 +7,15 @@ date: 2025-12-01 domain: ai-alignment secondary_domains: [mechanisms, grand-strategy] format: paper -status: unprocessed +status: processed priority: medium tags: [full-stack-alignment, institutional-alignment, thick-values, normative-competence, co-alignment] +processed_by: theseus +processed_date: 2026-03-11 +claims_extracted: ["ai-alignment-requires-institutional-co-alignment-not-just-model-alignment.md", "thick-models-of-value-distinguish-enduring-values-from-temporary-preferences-enabling-normative-reasoning.md"] +enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md", "AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation.md"] +extraction_model: "anthropic/claude-sonnet-4.5" +extraction_notes: "Extracted two novel claims from Full-Stack Alignment paper: (1) institutional co-alignment as necessary for beneficial AI outcomes, extending coordination thesis to institutions themselves, and (2) thick models of value as formalization of continuous value integration. Applied three enrichments to existing coordination and continuous-alignment claims. Paper is architecturally ambitious but lacks technical specificity or empirical validation—confidence levels reflect this (experimental for institutional co-alignment, speculative for thick value models). No engagement with RLHF/bridging mechanisms or formal impossibility results as curator noted." --- ## Content