- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 4) Pentagon-Agent: Theseus <HEADLESS>
42 lines
2.9 KiB
Markdown
42 lines
2.9 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
description: "MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding alignment gap rather than eliminating it"
|
|
confidence: experimental
|
|
source: "Gaikwad 2025, Murphy's Laws of AI Alignment (arxiv.org/abs/2509.05381)"
|
|
created: 2026-03-11
|
|
last_evaluated: 2026-03-11
|
|
---
|
|
|
|
# Alignment gap is manageable not eliminable through MAPS framework
|
|
|
|
The alignment gap—the difference between specified objectives and true human values—cannot be eliminated but can be mapped, bounded, and managed through four design levers. This reframes alignment from "solve the problem" to "manage the gap." The goal is not perfect alignment but bounded misalignment that stays within acceptable risk thresholds.
|
|
|
|
## The Four Design Levers
|
|
|
|
Gaikwad (2025) introduces the MAPS framework as a response to the exponential sample complexity barrier from feedback misspecification. The four levers are:
|
|
|
|
1. **Misspecification**: Identify contexts where feedback is unreliable (via calibration oracle)
|
|
2. **Annotation**: Improve feedback quality in high-stakes contexts
|
|
3. **Pressure**: Reduce optimization intensity to limit exploitation of misspecified rewards
|
|
4. **Shift**: Monitor and adapt to distribution shift between training and deployment
|
|
|
|
Murphy's Law of AI Alignment: "The gap always wins unless you actively route around misspecification."
|
|
|
|
The framework treats alignment as an ongoing management problem rather than a one-time solution. Rather than attempting to specify perfect human values upfront, the MAPS approach assumes misspecification is inevitable and designs systems to detect and contain it.
|
|
|
|
## Evidence and Scope
|
|
|
|
The framework is presented as a conceptual response to the formal exponential barrier result. Gaikwad argues that because the exponential barrier is fundamental to single-evaluator feedback, alignment strategies must shift from elimination to management. The four levers map to different points in the training and deployment pipeline where misspecification can be detected or contained.
|
|
|
|
However, the framework remains conceptual rather than operational—it identifies levers but does not specify how to pull them in practice. The claim that the gap is "manageable" depends on whether organizations can implement these levers effectively, which remains unproven.
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] — MAPS is an adaptive governance approach to alignment
|
|
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — MAPS Shift lever directly addresses this problem
|
|
- [[safe AI development requires building alignment mechanisms before scaling capability]] — MAPS provides a framework for those mechanisms
|
|
|
|
Topics:
|
|
- [[domains/ai-alignment/_map]]
|