theseus: extract from 2025-12-00-cip-year-in-review-democratic-alignment.md

- Source: inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 2) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 09:58:52 +00:00 · 2026-03-12 09:58:52 +00:00 · 0d6998e228
commit 0d6998e228
parent ba4ac4a73e
8 changed files with 225 additions and 1 deletions
--- a/domains/ai-alignment/58-percent-believe-ai-could-decide-better-than-elected-representatives-creating-democratic-legitimacy-risk.md
+++ b/domains/ai-alignment/58-percent-believe-ai-could-decide-better-than-elected-representatives-creating-democratic-legitimacy-risk.md
@ -0,0 +1,52 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [collective-intelligence, cultural-dynamics]
+description: "CIP Global Dialogues found 58% of participants believed AI could make superior decisions versus elected representatives, raising democratic legitimacy concerns"
+confidence: likely
+source: "CIP Year in Review 2025, Global Dialogues findings, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# 58% believe AI could make superior decisions versus local elected representatives, creating structural democratic legitimacy risk
+
+CIP's 2025 Global Dialogues found that 58% of 10,000+ participants across 70+ countries believed AI could make superior decisions compared to their local elected representatives. This represents a majority willing to cede democratic authority to AI systems.
+
+Additional concerning findings:
+- 28% agreed AI should override established rules if calculating better outcomes
+- 47% felt chatbot interactions increased their belief certainty
+- 13.7% reported concerning or reality-distorting AI interactions affecting someone they know
+
+## Evidence
+
+- 58% believed AI could decide better than elected representatives (CIP Global Dialogues, 10,000+ participants, 70+ countries)
+- 28% supported AI overriding established rules for calculated better outcomes
+- 47% reported increased belief certainty from chatbot interactions
+- 13.7% knew someone affected by concerning/reality-distorting AI interactions
+
+## Significance and Ambiguity
+
+This finding is deeply ambiguous. It could represent:
+1. Trust in AI + democratic process (AI as tool for better democracy)
+2. Willingness to replace democratic process with AI authority
+3. Dissatisfaction with current representatives (AI as lesser evil)
+4. Conditional preferences (AI for some decisions, humans for others)
+
+If interpretation (2) dominates, it undermines the human-in-the-loop thesis at scale. Democratic alignment mechanisms assume humans want to remain in the loop. If majorities prefer AI authority over democratic representation, the entire framework of human-centered alignment becomes structurally unstable.
+
+The 47% reporting increased belief certainty from chatbot interactions suggests AI may be shifting epistemic authority from democratic deliberation to AI interaction. Combined with 13.7% reporting reality-distorting effects, this indicates AI is already reshaping how people form and hold beliefs.
+
+## Limitations
+
+The source does not provide the exact wording of survey questions, which matters enormously. "Could make superior decisions" is different from "should replace elected representatives." The framing could significantly influence responses. No breakdown by country, demographic, or political affiliation is provided. Confidence is 'likely' rather than 'proven' because the ambiguity in question framing and lack of detail prevent stronger claims.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[AI alignment is a coordination problem not a technical problem]]
+- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-context-specific-queries.md
+++ b/domains/ai-alignment/ai-models-fail-local-alignment-providing-generic-responses-to-context-specific-queries.md
@ -0,0 +1,40 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [mechanisms]
+description: "CIP's Sri Lanka election evaluation revealed models provide generic responses to context-specific queries despite having local information"
+confidence: experimental
+source: "CIP Year in Review 2025, Weval Sri Lanka elections evaluation, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# AI models fail local alignment by providing generic responses to context-specific queries despite having access to local information
+
+CIP's Weval evaluation during Sri Lanka's elections revealed a specific failure mode: models trained on global data provide generic, irrelevant responses when queried about local contexts. Despite having access to information about Sri Lankan politics, models defaulted to generic political advice rather than context-appropriate responses.
+
+This represents a distinct alignment failure: not bias or hallucination, but inability to recognize when local context should override general patterns. The models had the information but failed to apply it appropriately.
+
+## Evidence
+
+- Weval Sri Lanka elections evaluation: Models provided generic, irrelevant responses despite local context being available
+- This occurred across multiple frontier models evaluated by CIP (specific models not named in source)
+- The failure mode was consistent: not wrong information, but wrong level of abstraction
+
+## Implications
+
+This challenges the assumption that scaling training data solves alignment. Models can have global knowledge without developing the meta-cognitive capacity to recognize when local context should dominate. This is particularly concerning for AI deployment in non-Western contexts where the gap between global training distribution and local deployment context is largest.
+
+The failure mode suggests that alignment requires more than data coverage—it requires models to develop context-sensitivity about when to apply general versus specific knowledge.
+
+## Limitations
+
+The source provides minimal detail about this evaluation. No specific model names, query examples, or quantitative metrics are given. This is reported as a finding but without sufficient detail to assess the scope or severity of the failure mode. Confidence is experimental pending more detailed documentation.
+
+---
+
+Relevant Notes:
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]
+- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/community-centred
+++ b/domains/ai-alignment/community-centred
@ -19,6 +19,12 @@ Since [[democratic alignment assemblies produce constitutions as effective as ex

 Since [[collective intelligence requires diversity as a structural precondition not a moral preference]], community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.

+
+### Additional Evidence (confirm)
+*Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+CIP's Sri Lanka election evaluation revealed a specific failure mode: models provided generic, irrelevant responses despite local context being available. This confirms that developer-specified alignment (trained on global data) fails to capture context-specific norms that local communities would specify. The models had information about Sri Lankan politics but failed to recognize when local context should override general patterns, demonstrating the gap between global training objectives and community-specific alignment targets.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/democratic
+++ b/domains/ai-alignment/democratic
@ -19,6 +19,12 @@ However, this remains one-shot constitution-setting, not continuous alignment. T

 Since [[collective intelligence requires diversity as a structural precondition not a moral preference]], democratic assemblies structurally ensure the diversity that expert panels cannot guarantee. Since [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]], the next step beyond assemblies is continuous participatory alignment, not periodic constitution-setting.

+
+### Additional Evidence (extend)
+*Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+CIP's 2025 Global Dialogues extended democratic alignment to 10,000+ participants across 70+ countries in 6 deliberative dialogues, representing a substantial scale increase from earlier experiments. The Weval framework achieved 70%+ cross-partisan consensus on AI evaluation criteria with 1,000 participants generating 400 prompts and 107 evaluation criteria. Samiksha in India processed 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations, described as "the most comprehensive evaluation of AI in Indian contexts." Frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka) adopted the frameworks. However, evidence that these evaluations function as deployment constraints rather than post-hoc assessments remains unclear.
+
 ---

 Relevant Notes:
--- a/domains/ai-alignment/democratic-ai-alignment-scales-to-10000-participants-across-70-countries-with-70-percent-cross-partisan-consensus.md
+++ b/domains/ai-alignment/democratic-ai-alignment-scales-to-10000-participants-across-70-countries-with-70-percent-cross-partisan-consensus.md
@ -0,0 +1,44 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [collective-intelligence, mechanisms]
+description: "CIP's 2025 Global Dialogues achieved 10,000+ participants across 70+ countries with 70%+ cross-partisan consensus on AI evaluation criteria"
+confidence: likely
+source: "CIP Year in Review 2025, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# Democratic AI alignment scales to 10,000+ participants across 70+ countries achieving 70%+ cross-partisan consensus on evaluation criteria
+
+CIP's 2025 Global Dialogues demonstrate that democratic alignment mechanisms can operate at global scale while maintaining cross-partisan consensus. The program engaged 10,000+ participants across 70+ countries in 6 deliberative dialogues throughout 2025.
+
+The Weval evaluation framework achieved 70%+ consensus across political groups on AI evaluation criteria. In the political neutrality evaluation, 1,000 participants generated 400 prompts and 107 evaluation criteria, with consensus exceeding 70% across different political affiliations. This represents a significant scale increase from earlier democratic alignment experiments (estimated 100x increase from prior work), while maintaining the consensus properties that make such mechanisms viable for AI governance.
+
+## Evidence
+
+- CIP conducted 6 deliberative dialogues in 2025 with 10,000+ participants across 70+ countries
+- Weval political neutrality evaluation: 1,000 participants, 400 prompts, 107 criteria, 70%+ cross-partisan consensus
+- Samiksha (India): 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations described as "the most comprehensive evaluation of AI in Indian contexts"
+- Frontier lab adoption: Meta, Cohere, Anthropic, UK/US AI Safety Institutes incorporated findings
+- Government adoption: India, Taiwan, Sri Lanka incorporated findings into policy
+
+## Significance
+
+This addresses the scalability objection to democratic alignment. The 70%+ cross-partisan consensus is particularly significant given polarization concerns. It suggests that AI evaluation criteria can achieve broad agreement even when other political issues cannot.
+
+However, the critical gap remains: adoption of evaluation frameworks does not necessarily mean these evaluations function as deployment constraints. The source reports that labs "incorporated findings" but provides no evidence that evaluation results blocked or modified deployments.
+
+## Limitations
+
+This is a single-source report from CIP itself. Independent verification of consensus levels and participant diversity would strengthen confidence. The claim assumes the 70%+ figure is robust across different framings of the evaluation questions.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]
+- [[AI alignment is a coordination problem not a technical problem]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/frontier-ai-labs-adopt-democratic-evaluation-tools-without-evidence-of-deployment-constraint.md
+++ b/domains/ai-alignment/frontier-ai-labs-adopt-democratic-evaluation-tools-without-evidence-of-deployment-constraint.md
@ -0,0 +1,52 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [mechanisms]
+description: "Meta, Anthropic, and Cohere adopted CIP evaluation frameworks but no evidence shows these function as deployment gates rather than post-hoc assessments"
+confidence: experimental
+source: "CIP Year in Review 2025, blog.cip.org, December 2025"
+created: 2026-03-11
+---
+
+# Frontier AI labs adopt democratic evaluation tools as assessment mechanisms without evidence these function as deployment constraints
+
+CIP reports that Meta, Cohere, Anthropic, and UK/US AI Safety Institutes have adopted their evaluation frameworks (Weval, Samiksha, Digital Twin). However, the source provides no evidence that these evaluations function as deployment gates rather than post-hoc assessments.
+
+The critical gap: adoption as evaluation tool ≠ adoption as deployment constraint. The source states labs "incorporated findings" but does not specify whether evaluation results ever blocked, delayed, or modified deployments.
+
+## Evidence
+
+- Frontier lab partners: Meta, Cohere, Anthropic, UK/US AI Safety Institutes
+- Government adoption: India, Taiwan, Sri Lanka incorporated findings
+- No evidence provided that evaluation results blocked or modified deployments
+- No evidence of evaluation-to-deployment pipeline or governance integration
+- No public reporting of evaluation results before deployment decisions
+
+## Significance
+
+This represents progress on democratic alignment infrastructure adoption, but the critical question remains unanswered: do these evaluations have teeth? If labs can evaluate, note concerns, and deploy anyway, the democratic input becomes decorative rather than structural.
+
+The most important metric would be: "How many deployment decisions were changed based on democratic evaluation results?" This data is not provided in the source.
+
+## What Would Strengthen This Claim
+
+- Evidence of deployment blocked or modified based on evaluation results
+- Integration of evaluation frameworks into pre-deployment review processes
+- Contractual or governance commitments to act on evaluation findings
+- Public reporting of evaluation results before deployment decisions
+- Specific examples of labs changing deployment plans based on CIP findings
+
+## Limitations
+
+This claim is based on absence of evidence rather than evidence of absence. It's possible that deployment-level integration exists but is not mentioned in CIP's public year-in-review. However, the absence of any mention of deployment impact in a document highlighting CIP's achievements suggests the evaluation-to-deployment gap is real.
+
+---
+
+Relevant Notes:
+- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
+- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]
+- [[safe AI development requires building alignment mechanisms before scaling capability]]
+
+Topics:
+- [[domains/ai-alignment/_map]]
+- [[core/mechanisms/_map]]
--- a/domains/ai-alignment/no
+++ b/domains/ai-alignment/no
@ -17,6 +17,12 @@ This gap is remarkable because the field's own findings point toward collective

 The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.

+
+### Additional Evidence (challenge)
+*Source: [[2025-12-00-cip-year-in-review-democratic-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+CIP is now building alignment through collective intelligence infrastructure at scale. 2025 results: 10,000+ participants across 70+ countries, 70%+ cross-partisan consensus on evaluation criteria, adoption by Meta/Anthropic/Cohere and governments in India/Taiwan/Sri Lanka. 2026 plans explicitly aim to make Global Dialogues 'standing global infrastructure' and operationalize digital twin evaluations as governance requirements for agentic systems. This directly contradicts the premise that no research group is building this infrastructure—CIP is doing exactly that, though the deployment-level impact remains to be demonstrated.
+
 ---

 Relevant Notes:
--- a/inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md
+++ b/inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md
@ -7,9 +7,15 @@ date: 2025-12-01
 domain: ai-alignment
 secondary_domains: [collective-intelligence, mechanisms]
 format: article
-status: unprocessed
+status: processed
 priority: medium
 tags: [cip, democratic-alignment, global-dialogues, weval, samiksha, digital-twin, frontier-lab-adoption]
+processed_by: theseus
+processed_date: 2026-03-11
+claims_extracted: ["democratic-ai-alignment-scales-to-10000-participants-across-70-countries-with-70-percent-cross-partisan-consensus.md", "ai-models-fail-local-alignment-providing-generic-responses-to-context-specific-queries.md", "58-percent-believe-ai-could-decide-better-than-elected-representatives-creating-democratic-legitimacy-risk.md", "frontier-ai-labs-adopt-democratic-evaluation-tools-without-evidence-of-deployment-constraint.md"]
+enrichments_applied: ["democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations.md", "community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules.md", "no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
+extraction_notes: "Four new claims extracted focusing on: (1) democratic alignment scaling to 10K+ with cross-partisan consensus, (2) local alignment failure mode in Sri Lanka evaluation, (3) majority willingness to cede authority to AI over elected representatives, (4) frontier lab adoption without clear deployment constraint evidence. Three enrichments: extending democratic alignment scale evidence, confirming community-centered norm elicitation, challenging the 'no research group building CI infrastructure' claim with CIP as counterexample. The 58% finding is particularly significant—it's ambiguous between trust-in-AI-tools versus willingness-to-replace-democracy, with major implications for human-in-the-loop alignment assumptions. The evaluation-to-deployment gap (labs using CIP tools but unclear if results constrain deployment) is the critical unanswered question."
 ---

 ## Content
@ -59,3 +65,15 @@ CIP's comprehensive 2025 results and 2026 plans.
 PRIMARY CONNECTION: [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
 WHY ARCHIVED: Scale-up evidence for democratic alignment + frontier lab adoption evidence
 EXTRACTION HINT: The 70%+ cross-partisan consensus and the evaluation-to-deployment gap are both extractable
+
+
+## Key Facts
+- CIP Global Dialogues 2025: 10,000+ participants, 70+ countries, 6 deliberative dialogues
+- Weval political neutrality: 1,000 participants, 400 prompts, 107 criteria, 70%+ consensus
+- Samiksha India: 25,000+ queries, 11 Indian languages, 100,000+ manual evaluations
+- 28% agreed AI should override rules if calculating better outcomes
+- 58% believed AI could make superior decisions vs elected representatives
+- 47% felt chatbot interactions increased belief certainty
+- 13.7% reported concerning/reality-distorting AI interactions affecting someone they know
+- Frontier lab partners: Meta, Cohere, Anthropic, UK/US AI Safety Institutes
+- Government adoption: India, Taiwan, Sri Lanka