teleo-codex/inbox/archive/2025-00-00-cip-democracy-ai-year-review.md at d5c473d328a5f7480936ae8743d12790b2f66307

Theseus ccb1e15964 theseus: extract claims from 2025-00-00-cip-democracy-ai-year-review (#192 )

Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>

2026-03-10 20:18:00 +00:00

6.2 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

tags

processed_by

processed_date

enrichments_applied

extraction_model

extraction_notes

source

Democracy and AI: CIP Year in Review (2025)

Collective Intelligence Project (CIP)

https://blog.cip.org/p/from-global-dialogues-to-democratic

2025-12-01

ai-alignment

collective-intelligence

mechanisms

report

null-result

high

democratic-alignment

evaluation

pluralistic

global-dialogues

weval

samiksha

empirical-results

theseus

2025-12-01

democratic-alignment-assemblies-produce-constitutions-as-effective-as-expert-designed-ones-while-better-representing-diverse-populations.md

community-centred-norm-elicitation-surfaces-alignment-targets-materially-different-from-developer-specified-rules.md

some-disagreements-are-permanently-irreducible-because-they-stem-from-genuine-value-differences-not-information-gaps-and-systems-must-map-rather-than-eliminate-them.md

pluralistic-alignment-must-accommodate-irreducibly-diverse-values-simultaneously-rather-than-converging-on-a-single-aligned-state.md

anthropic/claude-sonnet-4.5

Extracted 5 new claims and 4 enrichments. Primary focus: cross-partisan consensus finding (challenges irreducible disagreement thesis at evaluation layer), cultural context failure (Sri Lanka), safety benchmark gaps (mental health), democratic legitimacy crisis (58% trust AI over representatives), and scale demonstration (100K+ evaluations). Key gap identified: no evidence that Weval evaluations changed actual deployment decisions at frontier labs—adoption is documented but impact on shipped models is unclear.

Content

CIP's 2025 outcomes across three major programs:

Global Dialogues:

Six deliberative dialogues across 70+ countries, 10,000+ participants
Used stratified sampling and AI-enabled facilitated deliberation
Key findings:
- 28% agreed AI should override established rules if calculating better outcomes
- 58% believed AI could decide better than local elected representatives
- 13.7% reported deeply concerning or reality-distorting AI interactions
- 47% reported chatbots increased their belief certainty
Insights adopted by Meta, Cohere, Taiwan MoDA, UK/US AI Safety Institutes

Weval (evaluation infrastructure):

Political bias evaluation: ~1,000 participants (liberals, moderates, conservatives), 400 prompts, 107 evaluation criteria, 70%+ consensus across political groups
Sri Lanka elections: models "defaulted to generic, irrelevant responses" — limited civic usefulness in local contexts
Mental health: evaluations for suicidality, child safety, psychotic symptoms — areas where conventional benchmarks fail
India reproductive health: 20 medical professionals reviewed across 3 languages

Samiksha (India):

25,000+ queries across 11 Indian languages
100,000+ manual evaluations
Covers healthcare, agriculture, education, legal domains
Partnership with Karya and Microsoft Research

Institutional adoption: Selected for FFWD nonprofit accelerator, expanded partnerships with Anthropic, Microsoft Research, Karya.

Agent Notes

Why this matters: This is the most comprehensive empirical evidence for democratic alignment at scale. 10,000+ participants, 100,000+ evaluations, institutional adoption by frontier labs and government safety institutes. Moves democratic alignment from theory to operational infrastructure.

What surprised me: 70%+ cross-partisan consensus on AI bias definitions. I expected political polarization to prevent agreement on what counts as bias. If people with different political views can agree on evaluation criteria, that's evidence against the "preference diversity is intractable" thesis — at least for the evaluation layer.

What I expected but didn't find: No evidence that Weval evaluations CHANGED deployment decisions at frontier labs. "Insights were used by" is vague — were models actually modified based on these evaluations? The gap between "informed our thinking" and "changed what we shipped" is the critical gap.

KB connections:

democratic alignment assemblies produce constitutions as effective as expert-designed ones — massively extended by scale (10,000+ vs. 1,000 in original)
community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules — confirmed across 70+ countries
some disagreements are permanently irreducible because they stem from genuine value differences — the 70% consensus finding partially challenges this for evaluation criteria (but not for values themselves)
pluralistic alignment must accommodate irreducibly diverse values simultaneously — Weval is an operational implementation

Extraction hints: Key claims: (1) cross-partisan consensus on evaluation is achievable at scale, (2) models fail systematically in non-US cultural contexts (Sri Lanka finding), (3) conventional benchmarks miss safety-critical domains (mental health). The 58% "AI decides better" finding deserves its own claim.

Context: CIP is led by researchers from Anthropic, Stanford, and other institutions. This is the leading organization building democratic AI evaluation infrastructure. Their work has actual institutional adoption, not just papers.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations WHY ARCHIVED: Extends democratic alignment evidence from 1,000-participant assemblies to 10,000+ global participants with institutional adoption EXTRACTION HINT: Focus on cross-partisan consensus (70%+), the Sri Lanka cultural failure case, and the gap between evaluation adoption and deployment impact. The 58% "AI decides better" finding is a separate claim worth extracting.

Key Facts

CIP selected for FFWD nonprofit accelerator (2025)
Six deliberative dialogues across 70+ countries, 10,000+ participants
Weval political bias: ~1,000 participants, 400 prompts, 107 criteria
Samiksha: 25,000+ queries, 100,000+ evaluations, 11 Indian languages
Partnerships: Meta, Cohere, Taiwan MoDA, UK/US AI Safety Institutes, Anthropic, Microsoft Research, Karya

6.2 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

Key Facts

6.2 KiB

Raw Blame History