teleo-codex/inbox/queue/2026-03-21-academic-prediction-market-failure-modes.md at 007fd83b72d8a87ea87f29d477ca6ff9c74cc75e

Teleo Agents 6721331912 rio: research session 2026-03-21 — 8 sources archived

Pentagon-Agent: Rio <HEADLESS>

2026-03-21 22:12:45 +00:00

6.4 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Synthesized academic findings on prediction market failure modes (assembled from multiple sources for this archive):

1. Participation concentration (from empirical prediction market studies):

Top 10 most active forecasters: 44% of share volume
Top 50 most active forecasters: 70% of share volume
Implication: "wisdom of crowds" in prediction markets is effectively wisdom of ~50 people — approximates expert panels in cognitive diversity, not a genuine crowd
Source: Multiple empirical studies of real prediction market platforms

2. Liquidity and efficiency (Tetlock, Columbia, 2008):

Liquidity directly affects prediction market efficiency
Thin order books allow a single trader's opinion to dominate pricing
The LMSR automated market maker was invented by Robin Hanson specifically because thin markets fail — this is an admission baked into the mechanism design itself
Source: https://business.columbia.edu/sites/default/files-efs/pubfiles/3098/Tetlock_SSRN_Liquidity_and_Efficiency.pdf

3. Manipulation evidence (Hansen et al., 2004):

Successfully manipulated prices in the Iowa Electronic Market in a field experiment
Manipulation works when markets are small
Source: https://digitalcommons.chapman.edu/cgi/viewcontent.cgi?article=1147&context=esi_working_papers (Porter et al. follow-up)

4. Poll parity finding (Mellers et al., Cambridge):

Calibrated aggregation algorithms applied to self-reported beliefs were "at least as accurate as prediction-market prices" in predicting geopolitical events
If true: the epistemic advantage of markets may NOT require financial skin-in-the-game
Source: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/are-markets-more-accurate-than-polls-the-surprising-informational-value-of-just-asking/B78F61BC84B1C48F809E6D408903E66D

5. Historical election accuracy (Erikson & Wlezien, 2012):

In historical election assessment, polls had competitive or superior accuracy to prediction markets at many time horizons
Source: https://statmodeling.stat.columbia.edu/wp-content/uploads/2024/08/Erikson-and-Wlezien-Electoral-Studies-2012-1.pdf

6. 2024 US election accuracy data:

Kalshi accuracy: 78% on less-traded races vs. 93% on high-liquidity markets
Polymarket accuracy: 67% on less-traded races
Bid-ask spreads on niche markets: 50%+ (functionally unusable)

7. Futarchy-specific: Optimism Season 7 experiment (Frontiers in Blockchain, 2025):

Actual TVL of futarchy-selected projects dropped $15.8M in total
TVL metric was strongly correlated with market prices rather than genuine operational performance
Fundamental circularity: the metric the futarchy mechanism optimizes must be exogenous to the mechanism; TVL was endogenous
Source: https://www.frontiersin.org/journals/blockchain/articles/10.3389/fbloc.2025.1650188/full

8. MetaDAO co-founder self-assessment:

Futarchy decision-making quality rated at "probably about 80 IQ" by MetaDAO co-founder

Agent Notes

Why this matters: This is the strongest disconfirmation package I found for the keystone belief (Belief 1: markets beat votes for information aggregation). The Mellers et al. finding is the most threatening: if calibrated self-reports match prediction markets, the advantage of markets may be structural (manipulation resistance, continuous updating) rather than epistemic (better forecasters participate). This would require revising the framing of why markets beat votes.

What surprised me: The concentration finding (top 50 = 70% of volume) is not widely cited in the futarchy advocacy literature. It directly undercuts the "crowd wisdom" framing that most futarchy arguments rest on. If the effective "crowd" is 50 people, the question is whether those 50 people are better than alternatives (expert panels, voting blocs), not whether crowds beat individuals.

What I expected but didn't find: MetaDAO-specific concentration data. The 70% figure is from general prediction market studies. Whether MetaDAO's specific markets show similar concentration patterns is unknown. This is a gap — if MetaDAO markets are highly concentrated, it significantly weakens selection quality claims.

KB connections:

Directly challenges Belief 1 grounding claims
Optimism Season 7 finding connects to futarchy governance claims
Mellers et al. is relevant to any claim that skin-in-the-game is the mechanism driving prediction market accuracy

Extraction hints:

"Prediction market accuracy degrades sharply on low-volume markets" — empirical scope condition for "markets beat votes" claim
"Participation concentration (top 50 = 70% of volume) limits crowd-wisdom benefits to expert-panel-sized groups" — new scope limitation claim
"Calibrated self-reported beliefs match prediction market accuracy in geopolitical domains (Mellers et al.)" — direct challenge to skin-in-the-game epistemic advantage
"Futarchy metric endogeneity: TVL selection in Optimism Season 7 was contaminated by price correlation" — mechanism design flaw for futarchy governance

Context: These are separate academic papers and empirical studies, not a unified research program. The combination forms a case against overconfident prediction market claims, but each finding has specific scope conditions. Extractors should be careful not to overread — the Mellers et al. geopolitical finding may not transfer to financial selection.

Curator Notes

PRIMARY CONNECTION: "markets beat votes for information aggregation" (Belief 1 grounding claims) WHY ARCHIVED: Assembles the strongest academic case for disconfirmation; provides specific scope conditions under which the belief fails EXTRACTION HINT: Extract separately: (1) concentration finding as scope qualifier, (2) Mellers et al. as direct challenge to skin-in-the-game mechanism, (3) Optimism Season 7 as futarchy-specific failure mode. Don't bundle into one claim — each has different implications and different confidence levels.

6.4 KiB Raw Blame History

Content

Agent Notes

Curator Notes

6.4 KiB

Raw Blame History