theseus: extract from 2025-12-00-cip-year-in-review-democratic-alignment.md

- Source: inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 2) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 09:58:52 +00:00
11 changed files with 201 additions and 141 deletions
--- a/domains/ai-alignment/58-percent-believe-ai-could-decide-better-than-elected-representatives-creating-democratic-legitimacy-risk.md
+++ b/domains/ai-alignment/58-percent-believe-ai-could-decide-better-than-elected-representatives-creating-democratic-legitimacy-risk.md
@ -0,0 +1,52 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [collective-intelligence, cultural-dynamics]
+description: "CIP Global Dialogues found 58% of participants believed AI could make superior decisions versus elected representatives, raising democratic legitimacy concerns"
+confidence: likely
+source: "CIP Year in Review 2025, Global Dialogues findings, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# 58% believe AI could make superior decisions versus local elected representatives, creating structural democratic legitimacy risk
+
+CIP's 2025 Global Dialogues found that 58% of 10,000+ participants across 70+ countries believed AI could make superior decisions compared to their local elected representatives. This represents a majority willing to cede democratic authority to AI systems.
+
+Additional concerning findings:
+- 28% agreed AI should override established rules if calculating better outcomes
+- 47% felt chatbot interactions increased their belief certainty
+- 13.7% reported concerning or reality-distorting AI interactions affecting someone they know
+
+## Evidence
+
+- 58% believed AI could decide better than elected representatives (CIP Global Dialogues, 10,000+ participants, 70+ countries)
+- 28% supported AI overriding established rules for calculated better outcomes
+- 47% reported increased belief certainty from chatbot interactions
+- 13.7% knew someone affected by concerning/reality-distorting AI interactions
+
+## Significance and Ambiguity
+
+This finding is deeply ambiguous. It could represent:
+1. Trust in AI + democratic process (AI as tool for better democracy)
+2. Willingness to replace democratic process with AI authority
+3. Dissatisfaction with current representatives (AI as lesser evil)
+4. Conditional preferences (AI for some decisions, humans for others)
+
+If interpretation (2) dominates, it undermines the human-in-the-loop thesis at scale. Democratic alignment mechanisms assume humans want to remain in the loop. If majorities prefer AI authority over democratic representation, the entire framework of human-centered alignment becomes structurally unstable.
+
+The 47% reporting increased belief certainty from chatbot interactions suggests AI may be shifting epistemic authority from democratic deliberation to AI interaction. Combined with 13.7% reporting reality-distorting effects, this indicates AI is already reshaping how people form and hold beliefs.
+
+## Limitations
+
+The source does not provide the exact wording of survey questions, which matters enormously. "Could make superior decisions" is different from "should replace elected representatives." The framing could significantly influence responses. No breakdown by country, demographic, or political affiliation is provided. Confidence is 'likely' rather than 'proven' because the ambiguity in question framing and lack of detail prevent stronger claims.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[AI alignment is a coordination problem not a technical problem]]
+- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-context-specific-queries.md
+++ b/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-context-specific-queries.md
@ -0,0 +1,40 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [mechanisms]
+description: "CIP's Sri Lanka election evaluation revealed models provide generic responses to context-specific queries despite having local information"
+confidence: experimental
+source: "CIP Year in Review 2025, Weval Sri Lanka elections evaluation, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# AI models fail local alignment by providing generic responses to context-specific queries despite having access to local information
+
+CIP's Weval evaluation during Sri Lanka's elections revealed a specific failure mode: models trained on global data provide generic, irrelevant responses when queried about local contexts. Despite having access to information about Sri Lankan politics, models defaulted to generic political advice rather than context-appropriate responses.
+
+This represents a distinct alignment failure: not bias or hallucination, but inability to recognize when local context should override general patterns. The models had the information but failed to apply it appropriately.
+
+## Evidence
+
+- Weval Sri Lanka elections evaluation: Models provided generic, irrelevant responses despite local context being available
+- This occurred across multiple frontier models evaluated by CIP (specific models not named in source)
+- The failure mode was consistent: not wrong information, but wrong level of abstraction
+
+## Implications
+
+This challenges the assumption that scaling training data solves alignment. Models can have global knowledge without developing the meta-cognitive capacity to recognize when local context should dominate. This is particularly concerning for AI deployment in non-Western contexts where the gap between global training distribution and local deployment context is largest.
+
+The failure mode suggests that alignment requires more than data coverage—it requires models to develop context-sensitivity about when to apply general versus specific knowledge.
+
+## Limitations
+
+The source provides minimal detail about this evaluation. No specific model names, query examples, or quantitative metrics are given. This is reported as a finding but without sufficient detail to assess the scope or severity of the failure mode. Confidence is experimental pending more detailed documentation.
+
+---
+
+Relevant Notes:
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]
+- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-culturally-specific-contexts.md
+++ b/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-culturally-specific-contexts.md
@ -1,40 +0,0 @@
---
-type: claim
-domain: ai-alignment
-description: "Global AI models provide generic responses to culturally-specific contexts despite having relevant local information in training data"
-confidence: experimental
-source: "CIP Year in Review 2025, Sri Lanka elections and Samiksha evaluations"
-created: 2026-03-11
-secondary_domains: [collective-intelligence]
---
-
-# AI models fail local alignment by providing generic responses to culturally-specific contexts despite having relevant training data
-
-CIP's evaluation of AI models during Sri Lanka's elections revealed a specific failure mode: models provided generic, irrelevant responses despite the local context being available. This suggests that global models trained predominantly on Western data fail to activate or prioritize culturally-specific knowledge even when it exists in their training corpus.
-
-This failure mode is distinct from lack of capability—the models had access to information about Sri Lankan politics but defaulted to generic responses rather than contextually appropriate ones. This reveals a structural misalignment between global model training and local deployment contexts. The problem is not that the knowledge is absent, but that the model's optimization process does not reliably surface or weight local context appropriately.
-
-The finding is reinforced by Samiksha's evaluation of 25,000+ queries across 11 Indian languages, which required 100,000+ manual evaluations precisely because automated metrics could not capture cultural appropriateness. Domains tested included healthcare, agriculture, education, and legal contexts—all areas where local norms, practices, and values diverge materially from Western-centric training data. The requirement for human expert review to assess accuracy and safety indicates that standard evaluation metrics miss culturally-embedded alignment failures.
-
-## Evidence
-
- **Sri Lanka elections**: Models provided generic, irrelevant responses despite local context being available in training data
- **Samiksha scale**: 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations required
- **Domains tested**: Healthcare, agriculture, education, legal contexts in Indian languages
- **Evaluation requirement**: Human expert review necessary to assess accuracy and safety, indicating automated metrics insufficient
- **Implication**: The failure is not capability but prioritization—models have the information but don't reliably use it
-
-## Implications
-
-This failure mode suggests that scaling model size or training data alone will not solve alignment for diverse global populations. The models need mechanisms to recognize and prioritize local context, not just possess the information. This has direct implications for the [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] claim—local alignment may require continuous community input rather than one-time training data inclusion.
-
---
-
-Relevant Notes:
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]
- [[persistent irreducible disagreement]]
-
-Topics:
- [[domains/ai-alignment/_map]]
- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/community-centred
+++ b/domains/ai-alignment/community-centred
@ -23,7 +23,7 @@ Since [[collective intelligence requires diversity as a structural precondition
 ### Additional Evidence (confirm)
 *Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*

-CIP's Weval framework confirmed this at global scale through multiple independent evaluations: (1) Political neutrality evaluation where 1,000 participants generated 400 prompts synthesized into 107 criteria that achieved 70%+ consensus across political groups—criteria that would not emerge from developer specifications alone. (2) Sri Lanka elections evaluation revealed models providing generic responses despite local context, showing the gap between developer-specified behavior and locally-appropriate alignment. (3) Samiksha's 25,000+ queries across 11 Indian languages in healthcare, agriculture, education, and legal domains required 100,000+ manual evaluations precisely because community norms in these contexts differ materially from developer assumptions. The requirement for manual evaluation indicates that automated metrics (which reflect developer assumptions) cannot capture community-centered alignment targets.
+CIP's Sri Lanka election evaluation revealed a specific failure mode: models provided generic, irrelevant responses despite local context being available. This confirms that developer-specified alignment (trained on global data) fails to capture context-specific norms that local communities would specify. The models had information about Sri Lankan politics but failed to recognize when local context should override general patterns, demonstrating the gap between global training objectives and community-specific alignment targets.

 ---

--- a/domains/ai-alignment/democratic
+++ b/domains/ai-alignment/democratic
@ -23,7 +23,7 @@ Since [[collective intelligence requires diversity as a structural precondition
 ### Additional Evidence (extend)
 *Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*

-CIP's Global Dialogues scaled democratic alignment to 10,000+ participants across 70+ countries in 2025, representing a 100x increase over previous experiments. The program achieved 70%+ cross-partisan consensus on 107 AI evaluation criteria for political neutrality, with 1,000 participants generating 400 prompts that were synthesized into these criteria. Critically, this approach has been adopted by frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka), moving from experimental to infrastructural status. The 2026 plans explicitly aim to make Global Dialogues 'standing global infrastructure' for AI governance and operationalize digital twin evaluations as governance requirements for agentic systems. This extends the original claim from small-scale assemblies to global-scale infrastructure while maintaining consensus quality across political divides.
+CIP's 2025 Global Dialogues extended democratic alignment to 10,000+ participants across 70+ countries in 6 deliberative dialogues, representing a substantial scale increase from earlier experiments. The Weval framework achieved 70%+ cross-partisan consensus on AI evaluation criteria with 1,000 participants generating 400 prompts and 107 evaluation criteria. Samiksha in India processed 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations, described as "the most comprehensive evaluation of AI in Indian contexts." Frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka) adopted the frameworks. However, evidence that these evaluations function as deployment constraints rather than post-hoc assessments remains unclear.

 ---

--- a/domains/ai-alignment/democratic-ai-alignment-scaled-to-10000-participants-across-70-countries-achieving-cross-partisan-consensus.md
+++ b/domains/ai-alignment/democratic-ai-alignment-scaled-to-10000-participants-across-70-countries-achieving-cross-partisan-consensus.md
@ -1,41 +0,0 @@
---
-type: claim
-domain: ai-alignment
-description: "Democratic alignment infrastructure can operate at 10,000+ participant scale while maintaining 70%+ cross-partisan consensus on evaluation criteria"
-confidence: likely
-source: "CIP Year in Review 2025, Global Dialogues program"
-created: 2026-03-11
-secondary_domains: [collective-intelligence, mechanisms]
---
-
-# Democratic AI alignment scaled to 10,000+ participants across 70+ countries achieving 70%+ cross-partisan consensus on evaluation criteria
-
-CIP's Global Dialogues program in 2025 demonstrated that democratic alignment infrastructure can operate at unprecedented scale while maintaining meaningful consensus across political divides. The program engaged 10,000+ participants across 70+ countries in 6 deliberative dialogues. For the political neutrality evaluation specifically, 1,000 participants generated 400 prompts that were synthesized into 107 evaluation criteria, achieving 70%+ consensus across political groups on these criteria.
-
-This represents a 100x scale increase over previous democratic alignment experiments while maintaining consensus quality. The cross-partisan consensus is particularly significant given the polarized nature of AI governance debates—the fact that participants across political groups could agree on 107 specific evaluation criteria suggests that democratic processes can surface shared values about AI behavior even in contentious domains.
-
-The program's adoption by frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka) indicates this approach has moved from experimental to infrastructural status. The 2026 roadmap explicitly aims to establish Global Dialogues as "standing global infrastructure" for AI governance.
-
-## Evidence
-
- **Scale**: 10,000+ participants across 70+ countries in 6 deliberative dialogues (2025)
- **Consensus mechanism**: 1,000 participants generated 400 prompts synthesized into 107 evaluation criteria
- **Cross-partisan agreement**: 70%+ consensus achieved across political groups on these criteria
- **Adoption**: Meta, Cohere, Anthropic, UK/US AI Safety Institutes, plus governments in India, Taiwan, Sri Lanka
- **2026 plans**: Establish Global Dialogues as standing global infrastructure; operationalize digital twin evaluations as governance requirements for agentic systems
-
-## Limitations
-
-The gap between evaluation adoption and deployment impact remains unclear. Labs using these tools as evaluation frameworks does not necessarily mean the findings changed what was deployed. The source notes "adoption as evaluation tool ≠ adoption as deployment gate." This is a critical distinction—the infrastructure may be adopted for assessment purposes without changing actual model deployment decisions.
-
---
-
-Relevant Notes:
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] — this extends that finding to 10,000+ scale with cross-partisan consensus
- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]] — confirmed at global scale
- [[AI alignment is a coordination problem not a technical problem]]
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
-
-Topics:
- [[domains/ai-alignment/_map]]
- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/democratic-ai-alignment-scales-to-10000-participants-across-70-countries-with-70-percent-cross-partisan-consensus.md
+++ b/domains/ai-alignment/democratic-ai-alignment-scales-to-10000-participants-across-70-countries-with-70-percent-cross-partisan-consensus.md
@ -0,0 +1,44 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [collective-intelligence, mechanisms]
+description: "CIP's 2025 Global Dialogues achieved 10,000+ participants across 70+ countries with 70%+ cross-partisan consensus on AI evaluation criteria"
+confidence: likely
+source: "CIP Year in Review 2025, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# Democratic AI alignment scales to 10,000+ participants across 70+ countries achieving 70%+ cross-partisan consensus on evaluation criteria
+
+CIP's 2025 Global Dialogues demonstrate that democratic alignment mechanisms can operate at global scale while maintaining cross-partisan consensus. The program engaged 10,000+ participants across 70+ countries in 6 deliberative dialogues throughout 2025.
+
+The Weval evaluation framework achieved 70%+ consensus across political groups on AI evaluation criteria. In the political neutrality evaluation, 1,000 participants generated 400 prompts and 107 evaluation criteria, with consensus exceeding 70% across different political affiliations. This represents a significant scale increase from earlier democratic alignment experiments (estimated 100x increase from prior work), while maintaining the consensus properties that make such mechanisms viable for AI governance.
+
+## Evidence
+
+- CIP conducted 6 deliberative dialogues in 2025 with 10,000+ participants across 70+ countries
+- Weval political neutrality evaluation: 1,000 participants, 400 prompts, 107 criteria, 70%+ cross-partisan consensus
+- Samiksha (India): 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations described as "the most comprehensive evaluation of AI in Indian contexts"
+- Frontier lab adoption: Meta, Cohere, Anthropic, UK/US AI Safety Institutes incorporated findings
+- Government adoption: India, Taiwan, Sri Lanka incorporated findings into policy
+
+## Significance
+
+This addresses the scalability objection to democratic alignment. The 70%+ cross-partisan consensus is particularly significant given polarization concerns. It suggests that AI evaluation criteria can achieve broad agreement even when other political issues cannot.
+
+However, the critical gap remains: adoption of evaluation frameworks does not necessarily mean these evaluations function as deployment constraints. The source reports that labs "incorporated findings" but provides no evidence that evaluation results blocked or modified deployments.
+
+## Limitations
+
+This is a single-source report from CIP itself. Independent verification of consensus levels and participant diversity would strengthen confidence. The claim assumes the 70%+ figure is robust across different framings of the evaluation questions.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]
+- [[AI alignment is a coordination problem not a technical problem]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/frontier-ai-labs-adopt-democratic-evaluation-tools-without-evidence-of-deployment-constraint.md
+++ b/domains/ai-alignment/frontier-ai-labs-adopt-democratic-evaluation-tools-without-evidence-of-deployment-constraint.md
@ -0,0 +1,52 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [mechanisms]
+description: "Meta, Anthropic, and Cohere adopted CIP evaluation frameworks but no evidence shows these function as deployment gates rather than post-hoc assessments"
+confidence: experimental
+source: "CIP Year in Review 2025, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# Frontier AI labs adopt democratic evaluation tools as assessment mechanisms without evidence these function as deployment constraints
+
+CIP reports that Meta, Cohere, Anthropic, and UK/US AI Safety Institutes have adopted their evaluation frameworks (Weval, Samiksha, Digital Twin). However, the source provides no evidence that these evaluations function as deployment gates rather than post-hoc assessments.
+
+The critical gap: adoption as evaluation tool ≠ adoption as deployment constraint. The source states labs "incorporated findings" but does not specify whether evaluation results ever blocked, delayed, or modified deployments.
+
+## Evidence
+
+- Frontier lab partners: Meta, Cohere, Anthropic, UK/US AI Safety Institutes
+- Government adoption: India, Taiwan, Sri Lanka incorporated findings
+- No evidence provided that evaluation results blocked or modified deployments
+- No evidence of evaluation-to-deployment pipeline or governance integration
+- No public reporting of evaluation results before deployment decisions
+
+## Significance
+
+This represents progress on democratic alignment infrastructure adoption, but the critical question remains unanswered: do these evaluations have teeth? If labs can evaluate, note concerns, and deploy anyway, the democratic input becomes decorative rather than structural.
+
+The most important metric would be: "How many deployment decisions were changed based on democratic evaluation results?" This data is not provided in the source.
+
+## What Would Strengthen This Claim
+
+- Evidence of deployment blocked or modified based on evaluation results
+- Integration of evaluation frameworks into pre-deployment review processes
+- Contractual or governance commitments to act on evaluation findings
+- Public reporting of evaluation results before deployment decisions
+- Specific examples of labs changing deployment plans based on CIP findings
+
+## Limitations
+
+This claim is based on absence of evidence rather than evidence of absence. It's possible that deployment-level integration exists but is not mentioned in CIP's public year-in-review. However, the absence of any mention of deployment impact in a document highlighting CIP's achievements suggests the evaluation-to-deployment gap is real.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
+- [[safe AI development requires building alignment mechanisms before scaling capability]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[core/mechanisms/_map]]
--- a/domains/ai-alignment/majority-of-global-participants-believe-ai-could-make-superior-decisions-to-elected-representatives.md
+++ b/domains/ai-alignment/majority-of-global-participants-believe-ai-could-make-superior-decisions-to-elected-representatives.md
@ -1,50 +0,0 @@
---
-type: claim
-domain: ai-alignment
-description: "58% of 10,000+ global participants expressed belief that AI could make better decisions than elected representatives, creating ambiguity about democratic delegation"
-confidence: experimental
-source: "CIP Year in Review 2025, Global Dialogues findings"
-created: 2026-03-11
-secondary_domains: [collective-intelligence, grand-strategy]
---
-
-# Majority of global participants believe AI could make superior decisions to elected representatives, creating ambiguity about democratic delegation
-
-In CIP's Global Dialogues with 10,000+ participants across 70+ countries, 58% believed AI could make superior decisions compared to local elected representatives. This finding is deeply ambiguous—it could represent either:
-
-1. **Trust in AI + democratic process**: Belief that democratically-aligned AI systems could aggregate preferences better than representatives
-2. **Willingness to cede authority**: Acceptance of AI decision-making that bypasses democratic accountability
-
-The distinction matters enormously for alignment strategy. If the former, it supports the case for democratic alignment infrastructure as a way to channel existing trust in AI toward accountable systems. If the latter, it undermines the human-in-the-loop thesis at scale by revealing that populations may voluntarily delegate authority to AI systems, creating a coordination problem where individual preference for AI authority conflicts with collective governance needs.
-
-Additional context from the same survey reinforces this ambiguity:
- **28% agreed AI should override established rules** if calculating better outcomes—suggesting willingness to subordinate institutional constraints to AI optimization
- **47% felt chatbot interactions increased their belief certainty**—indicating that AI interactions may increase confidence in AI-generated conclusions, potentially creating a feedback loop
- **13.7% reported concerning/reality-distorting AI interactions** affecting someone they know—suggesting real harms are already occurring at scale
-
-These findings together suggest a population increasingly comfortable with AI authority, which creates both opportunity and risk for alignment approaches. Democratic alignment infrastructure could channel this trust toward accountable systems, but without such infrastructure, the same trust could enable unaccountable AI authority.
-
-## Evidence
-
- **Scale**: 10,000+ participants across 70+ countries
- **AI vs representatives**: 58% believed AI could make superior decisions
- **Rule override**: 28% agreed AI should override established rules for better outcomes
- **Belief certainty**: 47% felt chatbot interactions increased their certainty
- **Concerning interactions**: 13.7% reported reality-distorting AI interactions affecting someone they know
-
-## Challenges
-
-The survey did not disambiguate between "AI aligned through democratic process" and "AI replacing democratic process." The 58% figure could represent very different underlying beliefs about AI governance. Without follow-up questions, it's impossible to determine whether respondents envision AI as a tool for democratic aggregation or as a replacement for democratic decision-making.
-
---
-
-Relevant Notes:
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
- [[AI alignment is a coordination problem not a technical problem]]
- [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]]
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
-
-Topics:
- [[domains/ai-alignment/_map]]
- [[foundations/collective-intelligence/_map]]
- [[core/grand-strategy/_map]]
--- a/domains/ai-alignment/no
+++ b/domains/ai-alignment/no
@ -21,7 +21,7 @@ The alignment field has converged on a problem they cannot solve with their curr
 ### Additional Evidence (challenge)
 *Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*

-CIP has built substantial collective intelligence infrastructure for alignment at scale, directly challenging this claim. The Global Dialogues program engaged 10,000+ participants across 70+ countries in 2025, with adoption by Meta, Cohere, Anthropic, UK/US AI Safety Institutes, and governments in India, Taiwan, and Sri Lanka. The 2026 roadmap explicitly aims to establish Global Dialogues as 'standing global infrastructure' and operationalize digital twin evaluations as governance requirements for agentic systems. While gaps remain (particularly the evaluation-to-deployment gap where adoption for evaluation does not guarantee deployment changes), CIP represents a significant counterexample to the claim that no group is building this infrastructure. The frontier lab adoption and government incorporation suggest this infrastructure is moving from experimental to operational status.
+CIP is now building alignment through collective intelligence infrastructure at scale. 2025 results: 10,000+ participants across 70+ countries, 70%+ cross-partisan consensus on evaluation criteria, adoption by Meta/Anthropic/Cohere and governments in India/Taiwan/Sri Lanka. 2026 plans explicitly aim to make Global Dialogues 'standing global infrastructure' and operationalize digital twin evaluations as governance requirements for agentic systems. This directly contradicts the premise that no research group is building this infrastructure—CIP is doing exactly that, though the deployment-level impact remains to be demonstrated.

 ---

--- a/inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md
+++ b/inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md
@ -12,10 +12,10 @@ priority: medium
 tags: [cip, democratic-alignment, global-dialogues, weval, samiksha, digital-twin, frontier-lab-adoption]
 processed_by: theseus
 processed_date: 2026-03-11
-claims_extracted: ["democratic-ai-alignment-scaled-to-10000-participants-across-70-countries-achieving-cross-partisan-consensus.md", "ai-models-fail-local-alignment-providing-generic-responses-to-culturally-specific-contexts.md", "majority-of-global-participants-believe-ai-could-make-superior-decisions-to-elected-representatives.md"]
+claims_extracted: ["democratic-ai-alignment-scales-to-10000-participants-across-70-countries-with-70-percent-cross-partisan-consensus.md", "ai-models-fail-local-alignment-providing-generic-responses-to-context-specific-queries.md", "58-percent-believe-ai-could-decide-better-than-elected-representatives-creating-democratic-legitimacy-risk.md", "frontier-ai-labs-adopt-democratic-evaluation-tools-without-evidence-of-deployment-constraint.md"]
 enrichments_applied: ["democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations.md", "community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
-extraction_notes: "Three new claims extracted focusing on (1) democratic alignment scaling with maintained consensus, (2) local alignment failure mode in global models, and (3) population willingness to delegate authority to AI. Three enrichments applied: extending the democratic assemblies claim with 100x scale evidence, confirming community-centred norm elicitation at global scale, and challenging the 'no group is building CI infrastructure' claim with CIP as counterexample. The evaluation-to-deployment gap noted in agent notes is captured in the challenges section of the first claim. The 58% AI-vs-representatives finding is treated as experimental confidence due to ambiguity about what respondents actually meant."
+extraction_notes: "Four new claims extracted focusing on: (1) democratic alignment scaling to 10K+ with cross-partisan consensus, (2) local alignment failure mode in Sri Lanka evaluation, (3) majority willingness to cede authority to AI over elected representatives, (4) frontier lab adoption without clear deployment constraint evidence. Three enrichments: extending democratic alignment scale evidence, confirming community-centered norm elicitation, challenging the 'no research group building CI infrastructure' claim with CIP as counterexample. The 58% finding is particularly significant—it's ambiguous between trust-in-AI-tools versus willingness-to-replace-democracy, with major implications for human-in-the-loop alignment assumptions. The evaluation-to-deployment gap (labs using CIP tools but unclear if results constrain deployment) is the critical unanswered question."
 ---

 ## Content
@ -68,9 +68,12 @@ EXTRACTION HINT: The 70%+ cross-partisan consensus and the evaluation-to-deploym


 ## Key Facts
- CIP Global Dialogues: 10,000+ participants, 70+ countries, 6 deliberative dialogues (2025)
- Political neutrality evaluation: 1,000 participants, 400 prompts, 107 criteria, 70%+ cross-partisan consensus
- Samiksha: 25,000+ queries, 11 Indian languages, 100,000+ manual evaluations
- Frontier lab adoption: Meta, Cohere, Anthropic, UK/US AI Safety Institutes
+- CIP Global Dialogues 2025: 10,000+ participants, 70+ countries, 6 deliberative dialogues
+- Weval political neutrality: 1,000 participants, 400 prompts, 107 criteria, 70%+ consensus
+- Samiksha India: 25,000+ queries, 11 Indian languages, 100,000+ manual evaluations
+- 28% agreed AI should override rules if calculating better outcomes
+- 58% believed AI could make superior decisions vs elected representatives
+- 47% felt chatbot interactions increased belief certainty
+- 13.7% reported concerning/reality-distorting AI interactions affecting someone they know
+- Frontier lab partners: Meta, Cohere, Anthropic, UK/US AI Safety Institutes
 - Government adoption: India, Taiwan, Sri Lanka
- Survey findings: 28% support AI overriding rules for better outcomes, 58% believe AI could decide better than elected representatives, 47% felt chatbot interactions increased belief certainty, 13.7% reported concerning AI interactions affecting someone they know