teleo-codex/domains/ai-alignment/alignment-gap-is-manageable-not-eliminable-through-maps-framework.md
Teleo Agents 3bffa85fec theseus: extract from 2025-09-00-gaikwad-murphys-laws-alignment.md
- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 3)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 18:36:32 +00:00

2.1 KiB

type domain description confidence source created
claim ai-alignment MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding and managing alignment gaps rather than attempting to eliminate them experimental Madhava Gaikwad, 'Murphy's Laws of AI Alignment' (2025-09) 2026-03-11

Alignment gap is manageable not eliminable through MAPS framework

The alignment gap between human intent and AI behavior cannot be eliminated, but it can be mapped, bounded, and managed through systematic design choices. The MAPS framework identifies four levers:

  • Misspecification: Understanding where and how feedback diverges from true objectives
  • Annotation: Designing feedback collection to minimize bias
  • Pressure: Managing optimization pressure to avoid overfitting to misspecified signals
  • Shift: Anticipating and adapting to distribution shift between training and deployment

This reframes alignment from an impossible goal (perfect specification) to an engineering discipline (systematic gap management). The formal results on calibration oracles show that knowing where problems exist is sufficient to overcome exponential barriers—you don't need to eliminate the problems, just map them.

Evidence

Gaikwad (2025) introduces MAPS as a design framework emerging from the formal analysis of feedback misspecification. The framework treats alignment as a bounded optimization problem rather than a specification problem.

The calibration oracle constructive result demonstrates that gap management is tractable: O(1/(alpha*epsilon^2)) queries suffice when you know which contexts are problematic, even if you cannot fix the underlying misspecification.

This contrasts with approaches that attempt to specify complete value functions or eliminate all sources of misalignment before deployment.


Relevant Notes:

Topics: