teleo-codex/domains/ai-alignment/community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules.md

3.9 KiB

description type domain created source confidence
STELA experiments with underrepresented communities empirically show that deliberative norm elicitation produces substantively different AI rules than developer teams create revealing whose values is an empirical question claim ai-alignment 2026-02-17 Bergman et al, STELA (Scientific Reports, March 2024); includes DeepMind researchers likely

community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

The STELA study (Bergman et al, Scientific Reports 2024, including Google DeepMind researchers) used a four-stage deliberative process -- theme generation, norm elicitation, rule development, ruleset review -- with underrepresented communities: female-identifying, Latina/o/x, African American, and Southeast Asian groups in the US. Participants engaged in deliberative focus groups examining LLM outputs and articulating what norms they believed should govern AI behavior.

The key finding: community-centred deliberation on LLM outputs elicits latent normative perspectives that differ substantively from rules set by AI developers. This is not a matter of different emphasis or framing -- different communities produce materially different alignment specifications. The question of "whose values" is not philosophical or abstract. It is an empirical question with measurably different answers depending on who participates.

This matters because the default in AI alignment is developer-specified values. Whether through RLHF annotator pools (skewing young, English-speaking, online), Anthropic's internally written constitutions, or OpenAI's safety team decisions, the values embedded in AI systems reflect the perspectives of their creators. STELA demonstrates empirically that this is not a neutral default -- it systematically excludes perspectives that would surface through inclusive deliberation.

Since democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations, the CIP/Anthropic experiment shows democratic input works mechanically. STELA adds that it produces different outputs -- different not just in process but in substance. Since pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state, the STELA finding provides empirical grounding for why pluralism is necessary, not just philosophically desirable.

Since collective intelligence requires diversity as a structural precondition not a moral preference, community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.


Relevant Notes:

Topics: