teleo-codex/domains/ai-alignment/feedback-misspecification-creates-exponential-sample-complexity-barrier-that-calibration-oracles-overcome.md
Teleo Agents 3bffa85fec theseus: extract from 2025-09-00-gaikwad-murphys-laws-alignment.md
- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 3)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 18:36:32 +00:00

2.5 KiB

type domain description confidence source created
claim ai-alignment Biased feedback on fraction alpha of contexts requires exp(n*alpha*epsilon^2) samples to distinguish reward functions, but calibration oracles reduce this to O(1/(alpha*epsilon^2)) likely Madhava Gaikwad, 'Murphy's Laws of AI Alignment' (2025-09) 2026-03-11

Feedback misspecification creates exponential sample complexity barrier that calibration oracles overcome

When human feedback is biased on a fraction alpha of contexts with bias strength epsilon, any learning algorithm needs exponentially many samples exp(nalphaepsilon^2) to distinguish between two possible "true" reward functions that differ only on problematic contexts. However, if you can identify WHERE feedback is unreliable (a "calibration oracle"), you can overcome the exponential barrier with just O(1/(alpha*epsilon^2)) queries.

This formalizes why alignment is hard in a fundamentally different way than Arrow's theorem or social choice impossibility results. Arrow says aggregation is impossible; this result says even with a single evaluator, rare edge cases with biased feedback create exponentially hard learning problems.

The constructive result is critical: knowing where the problems are makes them efficiently solvable. This maps directly to collective intelligence architectures where domain experts can serve as calibration mechanisms by identifying their own edge cases and uncertainty boundaries.

Evidence

Gaikwad (2025) proves the exponential lower bound formally: when feedback is biased on fraction alpha of contexts with bias strength epsilon, sample complexity is exp(nalphaepsilon^2). The constructive result shows that a calibration oracle—knowledge of which contexts have unreliable feedback—reduces complexity to O(1/(alpha*epsilon^2)).

Key parameters:

  • alpha: frequency of problematic contexts
  • epsilon: bias strength in those contexts
  • gamma: degree of disagreement in true objectives

The "Murphy's Law of AI Alignment": "The gap always wins unless you actively route around misspecification."


Relevant Notes:

Topics: