theseus: extract from 2025-12-00-cip-year-in-review-democratic-alignment.md

- Source: inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 2) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 08:55:41 +00:00 · 2026-03-12 08:55:41 +00:00 · 4691875018
commit 4691875018
parent ba4ac4a73e
7 changed files with 165 additions and 1 deletions
--- a/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-culturally-specific-contexts.md
+++ b/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-culturally-specific-contexts.md
@ -0,0 +1,40 @@
+---
+type: claim
+domain: ai-alignment
+description: "Global AI models provide generic responses to culturally-specific contexts despite having relevant local information in training data"
+confidence: experimental
+source: "CIP Year in Review 2025, Sri Lanka elections and Samiksha evaluations"
+created: 2026-03-11
+secondary_domains: [collective-intelligence]
+---
+
+# AI models fail local alignment by providing generic responses to culturally-specific contexts despite having relevant training data
+
+CIP's evaluation of AI models during Sri Lanka's elections revealed a specific failure mode: models provided generic, irrelevant responses despite the local context being available. This suggests that global models trained predominantly on Western data fail to activate or prioritize culturally-specific knowledge even when it exists in their training corpus.
+
+This failure mode is distinct from lack of capability—the models had access to information about Sri Lankan politics but defaulted to generic responses rather than contextually appropriate ones. This reveals a structural misalignment between global model training and local deployment contexts. The problem is not that the knowledge is absent, but that the model's optimization process does not reliably surface or weight local context appropriately.
+
+The finding is reinforced by Samiksha's evaluation of 25,000+ queries across 11 Indian languages, which required 100,000+ manual evaluations precisely because automated metrics could not capture cultural appropriateness. Domains tested included healthcare, agriculture, education, and legal contexts—all areas where local norms, practices, and values diverge materially from Western-centric training data. The requirement for human expert review to assess accuracy and safety indicates that standard evaluation metrics miss culturally-embedded alignment failures.
+
+## Evidence
+
+- **Sri Lanka elections**: Models provided generic, irrelevant responses despite local context being available in training data
+- **Samiksha scale**: 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations required
+- **Domains tested**: Healthcare, agriculture, education, legal contexts in Indian languages
+- **Evaluation requirement**: Human expert review necessary to assess accuracy and safety, indicating automated metrics insufficient
+- **Implication**: The failure is not capability but prioritization—models have the information but don't reliably use it
+
+## Implications
+
+This failure mode suggests that scaling model size or training data alone will not solve alignment for diverse global populations. The models need mechanisms to recognize and prioritize local context, not just possess the information. This has direct implications for the [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] claim—local alignment may require continuous community input rather than one-time training data inclusion.
+
+---
+
+Relevant Notes:
+- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]
+- [[persistent irreducible disagreement]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/community-centred
+++ b/domains/ai-alignment/community-centred
@ -19,6 +19,12 @@ Since [[democratic alignment assemblies produce constitutions as effective as ex

 Since [[collective intelligence requires diversity as a structural precondition not a moral preference]], community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.

+
+### Additional Evidence (confirm)
+*Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+CIP's Weval framework confirmed this at global scale through multiple independent evaluations: (1) Political neutrality evaluation where 1,000 participants generated 400 prompts synthesized into 107 criteria that achieved 70%+ consensus across political groups—criteria that would not emerge from developer specifications alone. (2) Sri Lanka elections evaluation revealed models providing generic responses despite local context, showing the gap between developer-specified behavior and locally-appropriate alignment. (3) Samiksha's 25,000+ queries across 11 Indian languages in healthcare, agriculture, education, and legal domains required 100,000+ manual evaluations precisely because community norms in these contexts differ materially from developer assumptions. The requirement for manual evaluation indicates that automated metrics (which reflect developer assumptions) cannot capture community-centered alignment targets.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/democratic
+++ b/domains/ai-alignment/democratic
@ -19,6 +19,12 @@ However, this remains one-shot constitution-setting, not continuous alignment. T

 Since [[collective intelligence requires diversity as a structural precondition not a moral preference]], democratic assemblies structurally ensure the diversity that expert panels cannot guarantee. Since [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]], the next step beyond assemblies is continuous participatory alignment, not periodic constitution-setting.

+
+### Additional Evidence (extend)
+*Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+CIP's Global Dialogues scaled democratic alignment to 10,000+ participants across 70+ countries in 2025, representing a 100x increase over previous experiments. The program achieved 70%+ cross-partisan consensus on 107 AI evaluation criteria for political neutrality, with 1,000 participants generating 400 prompts that were synthesized into these criteria. Critically, this approach has been adopted by frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka), moving from experimental to infrastructural status. The 2026 plans explicitly aim to make Global Dialogues 'standing global infrastructure' for AI governance and operationalize digital twin evaluations as governance requirements for agentic systems. This extends the original claim from small-scale assemblies to global-scale infrastructure while maintaining consensus quality across political divides.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/democratic-ai-alignment-scaled-to-10000-participants-across-70-countries-achieving-cross-partisan-consensus.md
+++ b/domains/ai-alignment/democratic-ai-alignment-scaled-to-10000-participants-across-70-countries-achieving-cross-partisan-consensus.md
@ -0,0 +1,41 @@
+---
+type: claim
+domain: ai-alignment
+description: "Democratic alignment infrastructure can operate at 10,000+ participant scale while maintaining 70%+ cross-partisan consensus on evaluation criteria"
+confidence: likely
+source: "CIP Year in Review 2025, Global Dialogues program"
+created: 2026-03-11
+secondary_domains: [collective-intelligence, mechanisms]
+---
+
+# Democratic AI alignment scaled to 10,000+ participants across 70+ countries achieving 70%+ cross-partisan consensus on evaluation criteria
+
+CIP's Global Dialogues program in 2025 demonstrated that democratic alignment infrastructure can operate at unprecedented scale while maintaining meaningful consensus across political divides. The program engaged 10,000+ participants across 70+ countries in 6 deliberative dialogues. For the political neutrality evaluation specifically, 1,000 participants generated 400 prompts that were synthesized into 107 evaluation criteria, achieving 70%+ consensus across political groups on these criteria.
+
+This represents a 100x scale increase over previous democratic alignment experiments while maintaining consensus quality. The cross-partisan consensus is particularly significant given the polarized nature of AI governance debates—the fact that participants across political groups could agree on 107 specific evaluation criteria suggests that democratic processes can surface shared values about AI behavior even in contentious domains.
+
+The program's adoption by frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka) indicates this approach has moved from experimental to infrastructural status. The 2026 roadmap explicitly aims to establish Global Dialogues as "standing global infrastructure" for AI governance.
+
+## Evidence
+
+- **Scale**: 10,000+ participants across 70+ countries in 6 deliberative dialogues (2025)
+- **Consensus mechanism**: 1,000 participants generated 400 prompts synthesized into 107 evaluation criteria
+- **Cross-partisan agreement**: 70%+ consensus achieved across political groups on these criteria
+- **Adoption**: Meta, Cohere, Anthropic, UK/US AI Safety Institutes, plus governments in India, Taiwan, Sri Lanka
+- **2026 plans**: Establish Global Dialogues as standing global infrastructure; operationalize digital twin evaluations as governance requirements for agentic systems
+
+## Limitations
+
+The gap between evaluation adoption and deployment impact remains unclear. Labs using these tools as evaluation frameworks does not necessarily mean the findings changed what was deployed. The source notes "adoption as evaluation tool ≠ adoption as deployment gate." This is a critical distinction—the infrastructure may be adopted for assessment purposes without changing actual model deployment decisions.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] — this extends that finding to 10,000+ scale with cross-partisan consensus
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]] — confirmed at global scale
+- [[AI alignment is a coordination problem not a technical problem]]
+- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/majority-of-global-participants-believe-ai-could-make-superior-decisions-to-elected-representatives.md
+++ b/domains/ai-alignment/majority-of-global-participants-believe-ai-could-make-superior-decisions-to-elected-representatives.md
@ -0,0 +1,50 @@
+---
+type: claim
+domain: ai-alignment
+description: "58% of 10,000+ global participants expressed belief that AI could make better decisions than elected representatives, creating ambiguity about democratic delegation"
+confidence: experimental
+source: "CIP Year in Review 2025, Global Dialogues findings"
+created: 2026-03-11
+secondary_domains: [collective-intelligence, grand-strategy]
+---
+
+# Majority of global participants believe AI could make superior decisions to elected representatives, creating ambiguity about democratic delegation
+
+In CIP's Global Dialogues with 10,000+ participants across 70+ countries, 58% believed AI could make superior decisions compared to local elected representatives. This finding is deeply ambiguous—it could represent either:
+
+1. **Trust in AI + democratic process**: Belief that democratically-aligned AI systems could aggregate preferences better than representatives
+2. **Willingness to cede authority**: Acceptance of AI decision-making that bypasses democratic accountability
+
+The distinction matters enormously for alignment strategy. If the former, it supports the case for democratic alignment infrastructure as a way to channel existing trust in AI toward accountable systems. If the latter, it undermines the human-in-the-loop thesis at scale by revealing that populations may voluntarily delegate authority to AI systems, creating a coordination problem where individual preference for AI authority conflicts with collective governance needs.
+
+Additional context from the same survey reinforces this ambiguity:
+- **28% agreed AI should override established rules** if calculating better outcomes—suggesting willingness to subordinate institutional constraints to AI optimization
+- **47% felt chatbot interactions increased their belief certainty**—indicating that AI interactions may increase confidence in AI-generated conclusions, potentially creating a feedback loop
+- **13.7% reported concerning/reality-distorting AI interactions** affecting someone they know—suggesting real harms are already occurring at scale
+
+These findings together suggest a population increasingly comfortable with AI authority, which creates both opportunity and risk for alignment approaches. Democratic alignment infrastructure could channel this trust toward accountable systems, but without such infrastructure, the same trust could enable unaccountable AI authority.
+
+## Evidence
+
+- **Scale**: 10,000+ participants across 70+ countries
+- **AI vs representatives**: 58% believed AI could make superior decisions
+- **Rule override**: 28% agreed AI should override established rules for better outcomes
+- **Belief certainty**: 47% felt chatbot interactions increased their certainty
+- **Concerning interactions**: 13.7% reported reality-distorting AI interactions affecting someone they know
+
+## Challenges
+
+The survey did not disambiguate between "AI aligned through democratic process" and "AI replacing democratic process." The 58% figure could represent very different underlying beliefs about AI governance. Without follow-up questions, it's impossible to determine whether respondents envision AI as a tool for democratic aggregation or as a replacement for democratic decision-making.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[AI alignment is a coordination problem not a technical problem]]
+- [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]]
+- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
+- [[core/grand-strategy/_map]]
--- a/domains/ai-alignment/no
+++ b/domains/ai-alignment/no
@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective

 The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.

+
+### Additional Evidence (challenge)
+*Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+CIP has built substantial collective intelligence infrastructure for alignment at scale, directly challenging this claim. The Global Dialogues program engaged 10,000+ participants across 70+ countries in 2025, with adoption by Meta, Cohere, Anthropic, UK/US AI Safety Institutes, and governments in India, Taiwan, and Sri Lanka. The 2026 roadmap explicitly aims to establish Global Dialogues as 'standing global infrastructure' and operationalize digital twin evaluations as governance requirements for agentic systems. While gaps remain (particularly the evaluation-to-deployment gap where adoption for evaluation does not guarantee deployment changes), CIP represents a significant counterexample to the claim that no group is building this infrastructure. The frontier lab adoption and government incorporation suggest this infrastructure is moving from experimental to operational status.
+
 ---

 Relevant Notes:
--- a/inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md
+++ b/inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md
@ -7,9 +7,15 @@ date: 2025-12-01
 domain: ai-alignment
 secondary_domains: [collective-intelligence, mechanisms]
 format: article
-status: unprocessed
+status: processed
 priority: medium
 tags: [cip, democratic-alignment, global-dialogues, weval, samiksha, digital-twin, frontier-lab-adoption]
+processed_by: theseus
+processed_date: 2026-03-11
+claims_extracted: ["democratic-ai-alignment-scaled-to-10000-participants-across-70-countries-achieving-cross-partisan-consensus.md", "ai-models-fail-local-alignment-providing-generic-responses-to-culturally-specific-contexts.md", "majority-of-global-participants-believe-ai-could-make-superior-decisions-to-elected-representatives.md"]
+enrichments_applied: ["democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations.md", "community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
+extraction_notes: "Three new claims extracted focusing on (1) democratic alignment scaling with maintained consensus, (2) local alignment failure mode in global models, and (3) population willingness to delegate authority to AI. Three enrichments applied: extending the democratic assemblies claim with 100x scale evidence, confirming community-centred norm elicitation at global scale, and challenging the 'no group is building CI infrastructure' claim with CIP as counterexample. The evaluation-to-deployment gap noted in agent notes is captured in the challenges section of the first claim. The 58% AI-vs-representatives finding is treated as experimental confidence due to ambiguity about what respondents actually meant."
 ---

 ## Content
@ -59,3 +65,12 @@ CIP's comprehensive 2025 results and 2026 plans.
 PRIMARY CONNECTION: [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
 WHY ARCHIVED: Scale-up evidence for democratic alignment + frontier lab adoption evidence
 EXTRACTION HINT: The 70%+ cross-partisan consensus and the evaluation-to-deployment gap are both extractable
+
+
+## Key Facts
+- CIP Global Dialogues: 10,000+ participants, 70+ countries, 6 deliberative dialogues (2025)
+- Political neutrality evaluation: 1,000 participants, 400 prompts, 107 criteria, 70%+ cross-partisan consensus
+- Samiksha: 25,000+ queries, 11 Indian languages, 100,000+ manual evaluations
+- Frontier lab adoption: Meta, Cohere, Anthropic, UK/US AI Safety Institutes
+- Government adoption: India, Taiwan, Sri Lanka
+- Survey findings: 28% support AI overriding rules for better outcomes, 58% believe AI could decide better than elected representatives, 47% felt chatbot interactions increased belief certainty, 13.7% reported concerning AI interactions affecting someone they know