- Source: inbox/archive/2025-12-00-cip-year-in-review-democratic-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 2) Pentagon-Agent: Theseus <HEADLESS>
4.8 KiB
| description | type | domain | created | source | confidence |
|---|---|---|---|---|---|
| CIP and Anthropic empirically demonstrated that publicly sourced AI constitutions via deliberative assemblies of 1000 participants perform as well as internally designed ones on helpfulness and harmlessness | claim | ai-alignment | 2026-02-17 | Anthropic/CIP, Collective Constitutional AI (arXiv 2406.07814, FAccT 2024); CIP Alignment Assemblies (cip.org, 2023-2025); STELA (Bergman et al, Scientific Reports, March 2024) | likely |
democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations
The Collective Intelligence Project (CIP), co-founded by Divya Siddarth and Saffron Huang, has run the most ambitious experiments in democratic AI alignment. Their Alignment Assemblies use deliberative processes where diverse participants collectively define rules for AI behavior, combining large-scale surveys (1,000+ participants) with platforms like Polis and AllOurIdeas.
In the landmark pilot with Anthropic (FAccT 2024), approximately 1,000 demographically representative Americans contributed 1,127 statements and cast 38,252 votes on what rules an AI chatbot should follow. Two Claude models were trained -- one using this publicly sourced constitution, one using Anthropic's internal constitution. The result: the public model was rated as helpful and harmless as the standard model. Democratic input did not degrade performance.
Two additional findings matter. First, participants showed remarkably high consensus, with only a few divisive statements per hundreds of consensus statements -- suggesting "whose values" may be less contested than assumed at the level of general principles. Second, CIP's Global Dialogues (bimonthly, 1000 participants from 70+ countries) demonstrated that participatory processes scale internationally.
However, this remains one-shot constitution-setting, not continuous alignment. The STELA study (Bergman et al, Scientific Reports 2024) adds a critical nuance: community-centred deliberation with underrepresented communities (female-identifying, Latina/o/x, African American, Southeast Asian groups) elicited latent normative perspectives materially different from developer-set rules. "Whose values" is not abstract -- different communities produce substantively different specifications.
Since collective intelligence requires diversity as a structural precondition not a moral preference, democratic assemblies structurally ensure the diversity that expert panels cannot guarantee. Since the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance, the next step beyond assemblies is continuous participatory alignment, not periodic constitution-setting.
Additional Evidence (extend)
Source: 2025-12-00-cip-year-in-review-democratic-alignment | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5
CIP's 2025 Global Dialogues extended democratic alignment to 10,000+ participants across 70+ countries in 6 deliberative dialogues, representing a substantial scale increase from earlier experiments. The Weval framework achieved 70%+ cross-partisan consensus on AI evaluation criteria with 1,000 participants generating 400 prompts and 107 evaluation criteria. Samiksha in India processed 25,000+ queries across 11 Indian languages with 100,000+ manual evaluations, described as "the most comprehensive evaluation of AI in Indian contexts." Frontier labs (Meta, Cohere, Anthropic) and governments (India, Taiwan, Sri Lanka) adopted the frameworks. However, evidence that these evaluations function as deployment constraints rather than post-hoc assessments remains unclear.
Relevant Notes:
- collective intelligence requires diversity as a structural precondition not a moral preference -- assemblies structurally ensure the diversity that expert panels cannot
- the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance -- continuous participation, not one-shot constitution-setting, is the full solution
- RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values -- democratic constitutions are an alternative to reward-function compression
- universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective -- assemblies work at the level of general principles despite theoretical impossibility for full preference aggregation
- no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it -- CIP is the closest to collective alignment infrastructure but still lacks continuous architecture
Topics: