Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
Deskilling Bridge (health <-> ai-alignment): 11 links Governance Mechanism Bridge (alignment <-> internet-finance): 8 links Attractor-Evidence Bridge (grand-strategy <-> health/AI/CI): 12 links Entertainment-Labor-FEP Bridge: 13 links (includes nested Markov blankets) Space-Energy Bridge: 11 links Cross-domain connectivity: 70 -> ~112 links (60% improvement) Co-Authored-By: Leo <leo@teleo.ai>
44 lines
4.7 KiB
Markdown
44 lines
4.7 KiB
Markdown
---
|
|
confidence: likely
|
|
created: 2026-02-17
|
|
description: STELA experiments with underrepresented communities empirically show that deliberative norm elicitation produces substantively different AI rules than developer teams create revealing whose
|
|
values is an empirical question
|
|
domain: ai-alignment
|
|
related:
|
|
- representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback
|
|
- futarchy-conditional-markets-aggregate-information-through-financial-stake-not-voting-participation
|
|
reweave_edges:
|
|
- representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback|related|2026-03-28
|
|
source: Bergman et al, STELA (Scientific Reports, March 2024); includes DeepMind researchers
|
|
type: claim
|
|
---
|
|
|
|
# community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules
|
|
|
|
The STELA study (Bergman et al, Scientific Reports 2024, including Google DeepMind researchers) used a four-stage deliberative process -- theme generation, norm elicitation, rule development, ruleset review -- with underrepresented communities: female-identifying, Latina/o/x, African American, and Southeast Asian groups in the US. Participants engaged in deliberative focus groups examining LLM outputs and articulating what norms they believed should govern AI behavior.
|
|
|
|
The key finding: community-centred deliberation on LLM outputs elicits latent normative perspectives that differ substantively from rules set by AI developers. This is not a matter of different emphasis or framing -- different communities produce materially different alignment specifications. The question of "whose values" is not philosophical or abstract. It is an empirical question with measurably different answers depending on who participates.
|
|
|
|
This matters because the default in AI alignment is developer-specified values. Whether through RLHF annotator pools (skewing young, English-speaking, online), Anthropic's internally written constitutions, or OpenAI's safety team decisions, the values embedded in AI systems reflect the perspectives of their creators. STELA demonstrates empirically that this is not a neutral default -- it systematically excludes perspectives that would surface through inclusive deliberation.
|
|
|
|
Since [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]], the CIP/Anthropic experiment shows democratic input works mechanically. STELA adds that it produces different outputs -- different not just in process but in substance. Since [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]], the STELA finding provides empirical grounding for why pluralism is necessary, not just philosophically desirable.
|
|
|
|
Since [[collective intelligence requires diversity as a structural precondition not a moral preference]], community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.
|
|
|
|
|
|
### Additional Evidence (confirm)
|
|
*Source: [[2025-11-00-operationalizing-pluralistic-values-llm-alignment]] | Added: 2026-03-15*
|
|
|
|
Empirical study with 27,375 ratings from 1,095 participants shows that demographic composition of training data produces 3-5 percentage point differences in model behavior across emotional awareness and toxicity dimensions. This quantifies the magnitude of difference between community-sourced and developer-specified alignment targets.
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] -- assemblies work mechanically; STELA shows they also produce substantively different outputs
|
|
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] -- STELA provides the empirical evidence that pluralism is necessary
|
|
- [[collective intelligence requires diversity as a structural precondition not a moral preference]] -- community norm elicitation is a concrete mechanism for structural diversity
|
|
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] -- developer-specified values are a special case of the single-function problem
|
|
- [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] -- STELA demonstrates what inclusive infrastructure reveals but does not build the infrastructure itself
|
|
|
|
Topics:
|
|
- [[_map]]
|