teleo-codex/domains/ai-alignment/alignment-gap-is-manageable-not-eliminable-through-maps-framework.md
Teleo Agents 1ea7313abf theseus: extract claims from 2025-09-00-gaikwad-murphys-laws-alignment.md
- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 06:34:32 +00:00

2.9 KiB

type domain description confidence source created last_evaluated
claim ai-alignment MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding alignment gap rather than eliminating it experimental Gaikwad 2025, Murphy's Laws of AI Alignment (arxiv.org/abs/2509.05381) 2026-03-11 2026-03-11

Alignment gap is manageable not eliminable through MAPS framework

The alignment gap—the difference between specified objectives and true human values—cannot be eliminated but can be mapped, bounded, and managed through four design levers. This reframes alignment from "solve the problem" to "manage the gap." The goal is not perfect alignment but bounded misalignment that stays within acceptable risk thresholds.

The Four Design Levers

Gaikwad (2025) introduces the MAPS framework as a response to the exponential sample complexity barrier from feedback misspecification. The four levers are:

  1. Misspecification: Identify contexts where feedback is unreliable (via calibration oracle)
  2. Annotation: Improve feedback quality in high-stakes contexts
  3. Pressure: Reduce optimization intensity to limit exploitation of misspecified rewards
  4. Shift: Monitor and adapt to distribution shift between training and deployment

Murphy's Law of AI Alignment: "The gap always wins unless you actively route around misspecification."

The framework treats alignment as an ongoing management problem rather than a one-time solution. Rather than attempting to specify perfect human values upfront, the MAPS approach assumes misspecification is inevitable and designs systems to detect and contain it.

Evidence and Scope

The framework is presented as a conceptual response to the formal exponential barrier result. Gaikwad argues that because the exponential barrier is fundamental to single-evaluator feedback, alignment strategies must shift from elimination to management. The four levers map to different points in the training and deployment pipeline where misspecification can be detected or contained.

However, the framework remains conceptual rather than operational—it identifies levers but does not specify how to pull them in practice. The claim that the gap is "manageable" depends on whether organizations can implement these levers effectively, which remains unproven.


Relevant Notes:

Topics: