teleo-codex/domains/ai-alignment/pluralistic-ai-alignment-through-multiple-systems-preserves-value-diversity-better-than-forced-consensus.md
m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected
Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:55:18 +01:00

3.5 KiB

type domain secondary_domains description confidence source created sourced_from
claim ai-alignment
collective-intelligence
mechanisms
Creating multiple AI systems reflecting genuinely incompatible values may be structurally superior to aggregating all preferences into one aligned system experimental Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024) 2026-03-11
inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md

Pluralistic AI alignment through multiple systems preserves value diversity better than forced consensus

Conitzer et al. (2024) propose a "pluralism option": rather than forcing all human values into a single aligned AI system through preference aggregation, create multiple AI systems that reflect genuinely incompatible value sets. This structural approach to pluralism may better preserve value diversity than any aggregation mechanism.

The paper positions this as an alternative to the standard alignment framing, which assumes a single AI system must be aligned with aggregated human preferences. When values are irreducibly diverse—not just different but fundamentally incompatible—attempting to merge them into one system necessarily distorts or suppresses some values. Multiple systems allow each value set to be faithfully represented.

This connects directly to the collective superintelligence thesis: rather than one monolithic aligned AI, a ecosystem of specialized systems with different value orientations, coordinating through explicit mechanisms. The paper doesn't fully develop this direction but identifies it as a viable path.

Evidence

  • Conitzer et al. (2024) explicitly propose "creating multiple AI systems reflecting genuinely incompatible values rather than forcing artificial consensus"
  • The paper cites persistent irreducible disagreement as a structural feature that aggregation cannot resolve
  • Stuart Russell's co-authorship signals this is a serious position within mainstream AI safety, not a fringe view

Relationship to Collective Superintelligence

This is the closest mainstream AI alignment has come to the collective superintelligence thesis articulated in collective superintelligence is the alternative to monolithic AI controlled by a few. The paper doesn't use the term "collective superintelligence" but the structural logic is identical: value diversity is preserved through system plurality rather than aggregation.

The key difference: Conitzer et al. frame this as an option among several approaches, while the collective superintelligence thesis argues this is the only path that preserves human agency at scale. The paper's pluralism option is permissive ("we could do this"), not prescriptive ("we must do this").

Open Questions

  • How do multiple value-aligned systems coordinate when their values conflict in practice?
  • What governance mechanisms determine which value sets get their own system?
  • Does this approach scale to thousands of value clusters or only to a handful?

Relevant Notes:

Topics:

  • domains/ai-alignment/_map
  • foundations/collective-intelligence/_map
  • core/mechanisms/_map