teleo-codex/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md

4.7 KiB

description type domain created source confidence
Current alignment approaches are all single-model focused while the hardest problems preference diversity scalable oversight and value evolution are inherently collective claim ai-alignment 2026-02-17 Survey of alignment research landscape 2025-2026 likely

no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it

The most striking gap in the alignment landscape as of 2025-2026: virtually no one is building alignment through collective intelligence infrastructure. The closest attempts are partial. Since democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations, CIP has demonstrated that democratic input works mechanically -- but this remains one-shot constitution-setting, not continuous architecture. Since community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules, STELA has shown that inclusive deliberation produces different outputs -- but it does not build the infrastructure for ongoing participation. Polis does consensus-mapping through statement submission and voting. Some multi-agent debate frameworks exist under the scalable oversight umbrella. The Cooperative AI Foundation studies multi-agent coordination. But none of these constitute a distributed architecture where alignment emerges from collective participation.

What does not exist: no system where contributor diversity structurally prevents value capture; no implementation of continuous value-weaving at scale; no infrastructure for collective oversight of superhuman AI components; no architecture where alignment is a property of the coordination protocol rather than a property trained into individual models. Since universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective, the impossibility of aggregation makes collective infrastructure -- which preserves diversity rather than aggregating it -- the only viable path.

This gap is remarkable because the field's own findings point toward collective approaches. Since RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values, diverse preference representation is needed. Since scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps, distributed oversight is needed. Since the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it, structural alignment is needed to eliminate the tax.

The alignment field has converged on a problem they cannot solve with their current paradigm (single-model alignment), and the alternative paradigm (collective alignment through distributed architecture) has barely been explored. This is the opening for the TeleoHumanity thesis -- not as philosophical speculation but as practical infrastructure that addresses problems the alignment community has identified but cannot solve within their current framework.


Relevant Notes:

Topics: