theseus: extract claims from 2025-04-00-survey-personalized-pluralistic-alignment (#513)
Co-authored-by: Theseus <theseus@agents.livingip.xyz> Co-committed-by: Theseus <theseus@agents.livingip.xyz>
This commit is contained in:
parent
0393b1abc5
commit
4534dc8ca4
1 changed files with 14 additions and 1 deletions
|
|
@ -7,9 +7,14 @@ date: 2025-04-01
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: paper
|
||||
status: unprocessed
|
||||
status: null-result
|
||||
priority: medium
|
||||
tags: [pluralistic-alignment, personalization, survey, taxonomy, RLHF, DPO]
|
||||
processed_by: theseus
|
||||
processed_date: 2025-04-11
|
||||
enrichments_applied: ["pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md", "RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
extraction_notes: "Survey paper extraction. Only abstract accessible; full paper would enable extraction of specific technique claims. Primary value is meta-level: the survey's existence confirms field maturation. Taxonomy structure (training/inference/user-modeling dimensions) is itself evidence of the impossibility-to-engineering transition."
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -33,3 +38,11 @@ Abstract only accessible via WebFetch. Full paper needed for comprehensive extra
|
|||
PRIMARY CONNECTION: pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state
|
||||
WHY ARCHIVED: Survey confirming the field has matured enough for systematization — evidence that the impossibility-to-engineering transition is real
|
||||
EXTRACTION HINT: Need to fetch full paper for comprehensive extraction. The taxonomy structure itself is the main contribution.
|
||||
|
||||
|
||||
## Key Facts
|
||||
- arXiv 2504.07070 published April 2025
|
||||
- Survey categorizes techniques across training-time, inference-time, and user-modeling dimensions
|
||||
- Training-time methods include RLHF variants, DPO variants, and mixture approaches
|
||||
- Inference-time methods include steering, prompting, and retrieval
|
||||
- User-modeling methods include profile-based, clustering, and prototype-based approaches
|
||||
|
|
|
|||
Loading…
Reference in a new issue