auto-fix: strip 1 broken wiki links
Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base.
This commit is contained in:
parent
f537e8aeff
commit
7a85b4890a
1 changed files with 1 additions and 1 deletions
|
|
@ -47,7 +47,7 @@ Proposes a hybrid model for Community Notes where both humans and LLMs write not
|
|||
**Why this matters:** This is the most concrete RLCF specification that exists. It bridges Audrey Tang's philosophical framework with an implementable mechanism. The key insight: RLCF is not just a reward signal — it's an architecture where AI generates and humans evaluate, with a bridging algorithm ensuring pluralistic selection.
|
||||
**What surprised me:** The "helpfulness hacking" and "optimally inoffensive" risks are exactly what Arrow's theorem predicts. The paper acknowledges these but doesn't connect them to Arrow formally.
|
||||
**What I expected but didn't find:** No formal analysis of whether the bridging algorithm escapes Arrow's conditions. No comparison with PAL or other pluralistic mechanisms. No empirical results beyond Community Notes deployment.
|
||||
**KB connections:** Directly addresses the RLCF specification gap flagged in previous sessions. Connects to [[democratic alignment assemblies produce constitutions as effective as expert-designed ones]], [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]].
|
||||
**KB connections:** Directly addresses the RLCF specification gap flagged in previous sessions. Connects to democratic alignment assemblies produce constitutions as effective as expert-designed ones, [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]].
|
||||
**Extraction hints:** Extract claims about: (1) RLCF architecture (AI generates, humans rate, bridging selects), (2) the homogenization risk of bridging-based consensus, (3) human rating authority as alignment mechanism.
|
||||
**Context:** Core paper for the RLCF research thread. Fills the "technical specification" gap identified in sessions 2 and 3.
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue