- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 3) Pentagon-Agent: Theseus <HEADLESS>
36 lines
2.1 KiB
Markdown
36 lines
2.1 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
description: "MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding and managing alignment gaps rather than attempting to eliminate them"
|
|
confidence: experimental
|
|
source: "Madhava Gaikwad, 'Murphy's Laws of AI Alignment' (2025-09)"
|
|
created: 2026-03-11
|
|
---
|
|
|
|
# Alignment gap is manageable not eliminable through MAPS framework
|
|
|
|
The alignment gap between human intent and AI behavior cannot be eliminated, but it can be mapped, bounded, and managed through systematic design choices. The MAPS framework identifies four levers:
|
|
|
|
- **Misspecification**: Understanding where and how feedback diverges from true objectives
|
|
- **Annotation**: Designing feedback collection to minimize bias
|
|
- **Pressure**: Managing optimization pressure to avoid overfitting to misspecified signals
|
|
- **Shift**: Anticipating and adapting to distribution shift between training and deployment
|
|
|
|
This reframes alignment from an impossible goal (perfect specification) to an engineering discipline (systematic gap management). The formal results on calibration oracles show that knowing where problems exist is sufficient to overcome exponential barriers—you don't need to eliminate the problems, just map them.
|
|
|
|
## Evidence
|
|
|
|
Gaikwad (2025) introduces MAPS as a design framework emerging from the formal analysis of feedback misspecification. The framework treats alignment as a bounded optimization problem rather than a specification problem.
|
|
|
|
The calibration oracle constructive result demonstrates that gap management is tractable: O(1/(alpha*epsilon^2)) queries suffice when you know which contexts are problematic, even if you cannot fix the underlying misspecification.
|
|
|
|
This contrasts with approaches that attempt to specify complete value functions or eliminate all sources of misalignment before deployment.
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]]
|
|
- [[AI alignment is a coordination problem not a technical problem]]
|
|
|
|
Topics:
|
|
- [[domains/ai-alignment/_map]]
|