m3taversal f73921a4a6 Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-)

2026-03-06 12:36:24 +00:00

3.9 KiB

Raw Blame History

description	type	domain	created	source	confidence
STELA experiments with underrepresented communities empirically show that deliberative norm elicitation produces substantively different AI rules than developer teams create revealing whose values is an empirical question	claim	ai-alignment	2026-02-17	Bergman et al, STELA (Scientific Reports, March 2024); includes DeepMind researchers	likely

community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

The STELA study (Bergman et al, Scientific Reports 2024, including Google DeepMind researchers) used a four-stage deliberative process -- theme generation, norm elicitation, rule development, ruleset review -- with underrepresented communities: female-identifying, Latina/o/x, African American, and Southeast Asian groups in the US. Participants engaged in deliberative focus groups examining LLM outputs and articulating what norms they believed should govern AI behavior.

The key finding: community-centred deliberation on LLM outputs elicits latent normative perspectives that differ substantively from rules set by AI developers. This is not a matter of different emphasis or framing -- different communities produce materially different alignment specifications. The question of "whose values" is not philosophical or abstract. It is an empirical question with measurably different answers depending on who participates.

This matters because the default in AI alignment is developer-specified values. Whether through RLHF annotator pools (skewing young, English-speaking, online), Anthropic's internally written constitutions, or OpenAI's safety team decisions, the values embedded in AI systems reflect the perspectives of their creators. STELA demonstrates empirically that this is not a neutral default -- it systematically excludes perspectives that would surface through inclusive deliberation.

Since democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations, the CIP/Anthropic experiment shows democratic input works mechanically. STELA adds that it produces different outputs -- different not just in process but in substance. Since pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state, the STELA finding provides empirical grounding for why pluralism is necessary, not just philosophically desirable.

Since collective intelligence requires diversity as a structural precondition not a moral preference, community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.

Relevant Notes:

democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations -- assemblies work mechanically; STELA shows they also produce substantively different outputs
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state -- STELA provides the empirical evidence that pluralism is necessary
collective intelligence requires diversity as a structural precondition not a moral preference -- community norm elicitation is a concrete mechanism for structural diversity
RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values -- developer-specified values are a special case of the single-function problem
no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it -- STELA demonstrates what inclusive infrastructure reveals but does not build the infrastructure itself

Topics:

_map

3.9 KiB Raw Blame History

community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

3.9 KiB

Raw Blame History