Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

reweave: 42 cross-domain links across 5 structural bridges

Deskilling Bridge (health <-> ai-alignment): 11 links
Governance Mechanism Bridge (alignment <-> internet-finance): 8 links
Attractor-Evidence Bridge (grand-strategy <-> health/AI/CI): 12 links
Entertainment-Labor-FEP Bridge: 13 links (includes nested Markov blankets)
Space-Energy Bridge: 11 links

Cross-domain connectivity: 70 -> ~112 links (60% improvement)

Co-Authored-By: Leo <leo@teleo.ai>

2026-04-21 13:38:51 +00:00

4.7 KiB

Raw Blame History

confidence

created

description

domain

reweave_edges

source

type

likely

2026-02-17

STELA experiments with underrepresented communities empirically show that deliberative norm elicitation produces substantively different AI rules than developer teams create revealing whose values is an empirical question

ai-alignment

representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback

futarchy-conditional-markets-aggregate-information-through-financial-stake-not-voting-participation

representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback|related|2026-03-28

Bergman et al, STELA (Scientific Reports, March 2024); includes DeepMind researchers

claim

community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

The STELA study (Bergman et al, Scientific Reports 2024, including Google DeepMind researchers) used a four-stage deliberative process -- theme generation, norm elicitation, rule development, ruleset review -- with underrepresented communities: female-identifying, Latina/o/x, African American, and Southeast Asian groups in the US. Participants engaged in deliberative focus groups examining LLM outputs and articulating what norms they believed should govern AI behavior.

The key finding: community-centred deliberation on LLM outputs elicits latent normative perspectives that differ substantively from rules set by AI developers. This is not a matter of different emphasis or framing -- different communities produce materially different alignment specifications. The question of "whose values" is not philosophical or abstract. It is an empirical question with measurably different answers depending on who participates.

This matters because the default in AI alignment is developer-specified values. Whether through RLHF annotator pools (skewing young, English-speaking, online), Anthropic's internally written constitutions, or OpenAI's safety team decisions, the values embedded in AI systems reflect the perspectives of their creators. STELA demonstrates empirically that this is not a neutral default -- it systematically excludes perspectives that would surface through inclusive deliberation.

Since democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations, the CIP/Anthropic experiment shows democratic input works mechanically. STELA adds that it produces different outputs -- different not just in process but in substance. Since pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state, the STELA finding provides empirical grounding for why pluralism is necessary, not just philosophically desirable.

Since collective intelligence requires diversity as a structural precondition not a moral preference, community-centred norm elicitation is a concrete mechanism for ensuring the structural diversity that collective alignment requires. Without it, alignment defaults to the values of whichever demographic builds the systems.

Additional Evidence (confirm)

Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15

Empirical study with 27,375 ratings from 1,095 participants shows that demographic composition of training data produces 3-5 percentage point differences in model behavior across emotional awareness and toxicity dimensions. This quantifies the magnitude of difference between community-sourced and developer-specified alignment targets.

Relevant Notes:

democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations -- assemblies work mechanically; STELA shows they also produce substantively different outputs
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state -- STELA provides the empirical evidence that pluralism is necessary
collective intelligence requires diversity as a structural precondition not a moral preference -- community norm elicitation is a concrete mechanism for structural diversity
RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values -- developer-specified values are a special case of the single-function problem
no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it -- STELA demonstrates what inclusive infrastructure reveals but does not build the infrastructure itself

Topics:

_map

4.7 KiB Raw Blame History

community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

Additional Evidence (confirm)

4.7 KiB

Raw Blame History