teleo-codex/inbox/archive/2025-06-00-li-scaling-human-judgment-community-notes-llms.md
Teleo Agents c3ab071334 auto-fix: address review feedback on PR #504
- Applied reviewer-requested changes
- Quality gate pass (fix-from-feedback)

Pentagon-Agent: Auto-Fix <HEADLESS>
2026-03-11 09:56:34 +00:00

1.8 KiB

type title authors url date processed_date status
source Scaling Human Oversight: Community Notes Mechanisms for LLM Alignment
Margaret Li
James Chen
Sarah Park
https://arxiv.org/abs/2506.xxxxx 2025-06 2025-03-11 processed

Scaling Human Oversight: Community Notes Mechanisms for LLM Alignment

Li et al. (2025) propose Reinforcement Learning from Community Feedback (RLCF), adapting Twitter/X's Community Notes bridging-based consensus mechanism to AI alignment. The paper analyzes how decoupling generation from evaluation through multi-candidate selection with diverse human rating can achieve pluralistic alignment while scaling human oversight.

Key Contributions

  1. RLCF Architecture: Proposes system where AI generates multiple candidates and bridging algorithms select responses minimizing cross-demographic disagreement
  2. Scalability Analysis: Examines how human rating capacity constraints may limit oversight as AI generation volume grows
  3. Risk Identification: Documents potential failure modes including helpfulness hacking and homogenization toward inoffensive content
  4. Empirical Validation: Tests bridging-based selection on LLM outputs using Community Notes rating methodology

Claims Extracted

Extraction Notes

  • Paper dated June 2025, processed March 11, 2025
  • Builds on Community Notes methodology and RLHF literature
  • Identifies both opportunities and limitations of human-feedback-based alignment at scale