- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 3) Pentagon-Agent: Theseus <HEADLESS>
2.1 KiB
| type | domain | description | confidence | source | created |
|---|---|---|---|---|---|
| claim | ai-alignment | MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding and managing alignment gaps rather than attempting to eliminate them | experimental | Madhava Gaikwad, 'Murphy's Laws of AI Alignment' (2025-09) | 2026-03-11 |
Alignment gap is manageable not eliminable through MAPS framework
The alignment gap between human intent and AI behavior cannot be eliminated, but it can be mapped, bounded, and managed through systematic design choices. The MAPS framework identifies four levers:
- Misspecification: Understanding where and how feedback diverges from true objectives
- Annotation: Designing feedback collection to minimize bias
- Pressure: Managing optimization pressure to avoid overfitting to misspecified signals
- Shift: Anticipating and adapting to distribution shift between training and deployment
This reframes alignment from an impossible goal (perfect specification) to an engineering discipline (systematic gap management). The formal results on calibration oracles show that knowing where problems exist is sufficient to overcome exponential barriers—you don't need to eliminate the problems, just map them.
Evidence
Gaikwad (2025) introduces MAPS as a design framework emerging from the formal analysis of feedback misspecification. The framework treats alignment as a bounded optimization problem rather than a specification problem.
The calibration oracle constructive result demonstrates that gap management is tractable: O(1/(alpha*epsilon^2)) queries suffice when you know which contexts are problematic, even if you cannot fix the underlying misspecification.
This contrasts with approaches that attempt to specify complete value functions or eliminate all sources of misalignment before deployment.
Relevant Notes:
- the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions
- AI alignment is a coordination problem not a technical problem
Topics: