- Applied reviewer-requested changes - Quality gate pass (fix-from-feedback) Pentagon-Agent: Auto-Fix <HEADLESS>
31 lines
No EOL
1.7 KiB
Markdown
31 lines
No EOL
1.7 KiB
Markdown
---
|
|
type: claim
|
|
title: Bridging-based consensus mechanisms risk homogenization toward optimally inoffensive content
|
|
domains:
|
|
- ai-alignment
|
|
- social-choice-theory
|
|
confidence: speculative
|
|
created: 2025-03-11
|
|
---
|
|
|
|
# Bridging-based consensus mechanisms risk homogenization toward optimally inoffensive content
|
|
|
|
RLCF's bridging-based selection mechanism, which prioritizes responses that minimize disagreement across diverse raters, may systematically favor bland, non-committal outputs over substantive but potentially divisive content. This represents a specific failure mode where consensus-seeking produces outputs optimized for inoffensiveness rather than quality or accuracy.
|
|
|
|
## Evidence
|
|
|
|
- Li et al. (2025) identify this as a theoretical concern: "bridging-based selection may inadvertently favor responses that are maximally inoffensive rather than maximally helpful"
|
|
- The mechanism structurally resembles [[Arrow's impossibility theorem]]'s prediction that aggregation mechanisms seeking universal acceptability tend toward lowest-common-denominator outcomes
|
|
- Community Notes data shows bridging scores correlate with "safe" framings that avoid controversial implications
|
|
|
|
## Implications
|
|
|
|
- May undermine the goal of producing genuinely helpful AI outputs in domains where useful advice requires taking positions
|
|
- Creates tension between pluralistic alignment goals and output quality
|
|
- Suggests bridging-based selection may need constraints or quality floors to prevent race-to-the-bland dynamics
|
|
|
|
## Extraction Notes
|
|
|
|
- Source: Li et al., "Scaling Human Oversight" (June 2025)
|
|
- Added: 2025-03-11
|
|
- Related to broader concerns about consensus mechanisms in social choice theory |