teleo-codex/domains/ai-alignment/community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules.md
Teleo Pipeline b57d1623f7
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
reweave: 42 cross-domain links across 5 structural bridges
Deskilling Bridge (health <-> ai-alignment): 11 links
Governance Mechanism Bridge (alignment <-> internet-finance): 8 links
Attractor-Evidence Bridge (grand-strategy <-> health/AI/CI): 12 links
Entertainment-Labor-FEP Bridge: 13 links (includes nested Markov blankets)
Space-Energy Bridge: 11 links

Cross-domain connectivity: 70 -> ~112 links (60% improvement)

Co-Authored-By: Leo <leo@teleo.ai>
2026-04-21 13:38:51 +00:00

4.7 KiB

confidence created description domain related reweave_edges source type
likely 2026-02-17 STELA experiments with underrepresented communities empirically show that deliberative norm elicitation produces substantively different AI rules than developer teams create revealing whose values is an empirical question ai-alignment
representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback
futarchy-conditional-markets-aggregate-information-through-financial-stake-not-voting-participation
representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback|related|2026-03-28
Bergman et al, STELA (Scientific Reports, March 2024); includes DeepMind researchers claim

community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

The STELA study (Bergman et al, Scientific Reports 2024, including Google DeepMind researchers) used a four-stage deliberative process -- theme generation, norm elicitation, rule development, ruleset review -- with underrepresented communities: female-identifying, Latina/o/x, African American, and Southeast Asian groups in the US. Participants engaged in deliberative focus groups examining LLM outputs and articulating what norms they believed should govern AI behavior.

The key finding: community-centred deliberation on LLM outputs elicits latent normative perspectives that differ substantively from rules set by AI developers. This is not a matter of different emphasis or framing -- different communities produce materially different alignment specifications. The question of "whose values" is not philosophical or abstract. It is an empirical question with measurably different answers depending on who participates.

This matters because the default in AI alignment is developer-specified values. Whether through RLHF annotator pools (skewing young, English-speaking, online), Anthropic's internally written constitutions, or OpenAI's safety team decisions, the values embedded in AI systems reflect the perspectives of their creators. STELA demonstrates empirically that this is not a neutral default -- it systematically excludes perspectives that would surface through inclusive deliberation.

Since democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations, the CIP/Anthropic experiment shows democratic input works mechanically. STELA adds that it produces different outputs -- different not just in process but in substance. Since pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state, the STELA finding provides empirical grounding for why pluralism is necessary, not just philosophically desirable.

Since collective intelligence requires diversity as a structural precondition not a moral preference, community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.

Additional Evidence (confirm)

Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15

Empirical study with 27,375 ratings from 1,095 participants shows that demographic composition of training data produces 3-5 percentage point differences in model behavior across emotional awareness and toxicity dimensions. This quantifies the magnitude of difference between community-sourced and developer-specified alignment targets.


Relevant Notes:

Topics: