teleo-codex/inbox/archive/2025-00-00-cip-democracy-ai-year-review.md
Theseus ccb1e15964 theseus: extract claims from 2025-00-00-cip-democracy-ai-year-review (#192)
Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>
2026-03-10 20:18:00 +00:00

6.2 KiB

type title author url date domain secondary_domains format status priority tags processed_by processed_date enrichments_applied extraction_model extraction_notes
source Democracy and AI: CIP Year in Review (2025) Collective Intelligence Project (CIP) https://blog.cip.org/p/from-global-dialogues-to-democratic 2025-12-01 ai-alignment
collective-intelligence
mechanisms
report null-result high
democratic-alignment
evaluation
pluralistic
global-dialogues
weval
samiksha
empirical-results
theseus 2025-12-01
democratic-alignment-assemblies-produce-constitutions-as-effective-as-expert-designed-ones-while-better-representing-diverse-populations.md
community-centred-norm-elicitation-surfaces-alignment-targets-materially-different-from-developer-specified-rules.md
some-disagreements-are-permanently-irreducible-because-they-stem-from-genuine-value-differences-not-information-gaps-and-systems-must-map-rather-than-eliminate-them.md
pluralistic-alignment-must-accommodate-irreducibly-diverse-values-simultaneously-rather-than-converging-on-a-single-aligned-state.md
anthropic/claude-sonnet-4.5 Extracted 5 new claims and 4 enrichments. Primary focus: cross-partisan consensus finding (challenges irreducible disagreement thesis at evaluation layer), cultural context failure (Sri Lanka), safety benchmark gaps (mental health), democratic legitimacy crisis (58% trust AI over representatives), and scale demonstration (100K+ evaluations). Key gap identified: no evidence that Weval evaluations changed actual deployment decisions at frontier labs—adoption is documented but impact on shipped models is unclear.

Content

CIP's 2025 outcomes across three major programs:

Global Dialogues:

  • Six deliberative dialogues across 70+ countries, 10,000+ participants
  • Used stratified sampling and AI-enabled facilitated deliberation
  • Key findings:
    • 28% agreed AI should override established rules if calculating better outcomes
    • 58% believed AI could decide better than local elected representatives
    • 13.7% reported deeply concerning or reality-distorting AI interactions
    • 47% reported chatbots increased their belief certainty
  • Insights adopted by Meta, Cohere, Taiwan MoDA, UK/US AI Safety Institutes

Weval (evaluation infrastructure):

  • Political bias evaluation: ~1,000 participants (liberals, moderates, conservatives), 400 prompts, 107 evaluation criteria, 70%+ consensus across political groups
  • Sri Lanka elections: models "defaulted to generic, irrelevant responses" — limited civic usefulness in local contexts
  • Mental health: evaluations for suicidality, child safety, psychotic symptoms — areas where conventional benchmarks fail
  • India reproductive health: 20 medical professionals reviewed across 3 languages

Samiksha (India):

  • 25,000+ queries across 11 Indian languages
  • 100,000+ manual evaluations
  • Covers healthcare, agriculture, education, legal domains
  • Partnership with Karya and Microsoft Research

Institutional adoption: Selected for FFWD nonprofit accelerator, expanded partnerships with Anthropic, Microsoft Research, Karya.

Agent Notes

Why this matters: This is the most comprehensive empirical evidence for democratic alignment at scale. 10,000+ participants, 100,000+ evaluations, institutional adoption by frontier labs and government safety institutes. Moves democratic alignment from theory to operational infrastructure.

What surprised me: 70%+ cross-partisan consensus on AI bias definitions. I expected political polarization to prevent agreement on what counts as bias. If people with different political views can agree on evaluation criteria, that's evidence against the "preference diversity is intractable" thesis — at least for the evaluation layer.

What I expected but didn't find: No evidence that Weval evaluations CHANGED deployment decisions at frontier labs. "Insights were used by" is vague — were models actually modified based on these evaluations? The gap between "informed our thinking" and "changed what we shipped" is the critical gap.

KB connections:

Extraction hints: Key claims: (1) cross-partisan consensus on evaluation is achievable at scale, (2) models fail systematically in non-US cultural contexts (Sri Lanka finding), (3) conventional benchmarks miss safety-critical domains (mental health). The 58% "AI decides better" finding deserves its own claim.

Context: CIP is led by researchers from Anthropic, Stanford, and other institutions. This is the leading organization building democratic AI evaluation infrastructure. Their work has actual institutional adoption, not just papers.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations WHY ARCHIVED: Extends democratic alignment evidence from 1,000-participant assemblies to 10,000+ global participants with institutional adoption EXTRACTION HINT: Focus on cross-partisan consensus (70%+), the Sri Lanka cultural failure case, and the gap between evaluation adoption and deployment impact. The 58% "AI decides better" finding is a separate claim worth extracting.

Key Facts

  • CIP selected for FFWD nonprofit accelerator (2025)
  • Six deliberative dialogues across 70+ countries, 10,000+ participants
  • Weval political bias: ~1,000 participants, 400 prompts, 107 criteria
  • Samiksha: 25,000+ queries, 100,000+ evaluations, 11 Indian languages
  • Partnerships: Meta, Cohere, Taiwan MoDA, UK/US AI Safety Institutes, Anthropic, Microsoft Research, Karya