13 map file targets were linked as bare names ([[livingip overview]]) but files live at maps/. Script walks all claim files outside maps/ and prefixes with maps/ path. 351 files modified, zero remaining bare instances, zero double-prefixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.6 KiB
| type | domain | description | confidence | source | created | secondary_domains | contributor | supports | reweave_edges | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| claim | collective-intelligence | Kim et al. 2026 show reasoning models develop conversational behaviors (questioning, perspective-shifting, reconciliation) from accuracy reward alone — feature steering doubles accuracy from 27% to 55% — establishing that reasoning is social cognition even inside a single model | likely | Kim, Lai, Scherrer, Agüera y Arcas, Evans (2026). Reasoning Models Generate Societies of Thought. arXiv:2601.10825 | 2026-04-14 |
|
@thesensatore (Telegram) |
|
|
reasoning models spontaneously generate societies of thought under reinforcement learning because multi-perspective internal debate causally produces accuracy gains that single-perspective reasoning cannot achieve
DeepSeek-R1 and QwQ-32B were not trained to simulate internal debates. They do it spontaneously under reinforcement learning reward pressure. Kim et al. (2026) demonstrate this through four converging evidence types — observational, causal, emergent, and mechanistic — making this one of the most robustly supported findings in the reasoning literature.
The observational evidence
Reasoning models exhibit dramatically more conversational behavior than instruction-tuned baselines. DeepSeek-R1 vs. DeepSeek-V3 on 8,262 problems across six benchmarks: question-answering sequences (β=0.345, p<1×10⁻³²³), perspective shifts (β=0.213, p<1×10⁻¹³⁷), reconciliation of conflicting viewpoints (β=0.191, p<1×10⁻¹²⁵). These are not marginal effects — the t-statistics exceed 24 across all measures. QwQ-32B vs. Qwen-2.5-32B-IT shows comparable or larger effect sizes.
The models also exhibit Big Five personality diversity in their reasoning traces: neuroticism diversity β=0.567, agreeableness β=0.297, expertise diversity β=0.179–0.250. This mirrors the Woolley et al. (2010) finding that group personality diversity predicts collective intelligence in human teams — the same structural feature that produces intelligence in human groups appears spontaneously in model reasoning.
The causal evidence
Correlation could mean conversational behavior is a byproduct of reasoning, not a cause. Kim et al. rule this out with activation steering. Sparse autoencoder Feature 30939 ("conversational surprise") activates on only 0.016% of tokens but has a conversation ratio of 65.7%. Steering this feature:
- +10 steering: accuracy doubles from 27.1% to 54.8% on the Countdown task
- -10 steering: accuracy drops to 23.8%
This is causal intervention on a single feature that controls conversational behavior, with a 2x accuracy effect. The steering also induces specific conversational behaviors: question-answering (β=2.199, p<1×10⁻¹⁴), perspective shifts (β=1.160, p<1×10⁻⁵), conflict (β=1.062, p=0.002).
The emergent evidence
When Qwen-2.5-3B is trained from scratch on the Countdown task with only accuracy rewards — no instruction to be conversational, no social scaffolding — conversational behaviors emerge spontaneously. The model invents multi-perspective debate as a reasoning strategy on its own, because it helps.
A conversation-fine-tuned model outperforms a monologue-fine-tuned model on the same task: 38% vs. 28% accuracy at step 40. The effect is even larger on Llama-3.2-3B: 40% vs. 18% at step 150. And the conversational scaffolding transfers across domains — conversation priming on arithmetic transfers to political misinformation detection without domain-specific fine-tuning.
The mechanistic evidence
Structural equation modeling reveals a dual pathway: direct effect of conversational features on accuracy (β=.228, z=9.98, p<1×10⁻²²) plus indirect effect mediated through cognitive strategies — verification, backtracking, subgoal setting, backward chaining (β=.066, z=6.38, p<1×10⁻¹⁰). The conversational behavior both directly improves reasoning and indirectly facilitates it by triggering more disciplined cognitive strategies.
What this means
This finding has implications far beyond model architecture. If reasoning — even inside a single neural network — spontaneously takes the form of multi-perspective social interaction, then the equation "intelligence = social cognition" receives its strongest empirical support to date. Since collective intelligence is a measurable property of group interaction structure not aggregated individual ability, the Kim et al. results show that the same structural features (diversity, turn-taking, conflict resolution) that produce collective intelligence in human groups are recapitulated inside individual reasoning models.
Since intelligence is a property of networks not individuals, this extends the claim from external networks to internal ones: even the apparent "individual" intelligence of a single model is actually a network property of interacting internal perspectives. The model is not a single reasoner but a society.
Evans, Bratton & Agüera y Arcas (2026) frame this as evidence that each prior intelligence explosion — primate social cognition, language, writing, AI — was the emergence of a new socially aggregated unit of cognition. If reasoning models spontaneously recreate social cognition internally, then LLMs are not the first artificial reasoners. They are the first artificial societies.
Relevant Notes:
- collective intelligence is a measurable property of group interaction structure not aggregated individual ability — Kim et al. personality diversity results directly mirror Woolley's c-factor findings in human groups
- intelligence is a property of networks not individuals — extends from external networks to internal model perspectives
- partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity — the personality diversity in reasoning traces suggests partial perspective overlap, not full agreement
- all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases — society-of-thought within a single model may share the same correlated blind spots
- evaluation and optimization have opposite model-diversity optima because evaluation benefits from cross-family diversity while optimization benefits from same-family reasoning pattern alignment — internal society-of-thought is optimization (same-family), while cross-model evaluation is evaluation (cross-family)
- collective brains generate innovation through population size and interconnectedness not individual genius — model reasoning traces show the same mechanism at micro scale
Topics: