Teleo Agents 1ea7313abf theseus: extract claims from 2025-09-00-gaikwad-murphys-laws-alignment.md

- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 4)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-11 06:34:32 +00:00

2.9 KiB

Raw Blame History

type	domain	description	confidence	source	created	last_evaluated
claim	ai-alignment	MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding alignment gap rather than eliminating it	experimental	Gaikwad 2025, Murphy's Laws of AI Alignment (arxiv.org/abs/2509.05381)	2026-03-11	2026-03-11

Alignment gap is manageable not eliminable through MAPS framework

The alignment gap—the difference between specified objectives and true human values—cannot be eliminated but can be mapped, bounded, and managed through four design levers. This reframes alignment from "solve the problem" to "manage the gap." The goal is not perfect alignment but bounded misalignment that stays within acceptable risk thresholds.

The Four Design Levers

Gaikwad (2025) introduces the MAPS framework as a response to the exponential sample complexity barrier from feedback misspecification. The four levers are:

Misspecification: Identify contexts where feedback is unreliable (via calibration oracle)
Annotation: Improve feedback quality in high-stakes contexts
Pressure: Reduce optimization intensity to limit exploitation of misspecified rewards
Shift: Monitor and adapt to distribution shift between training and deployment

Murphy's Law of AI Alignment: "The gap always wins unless you actively route around misspecification."

The framework treats alignment as an ongoing management problem rather than a one-time solution. Rather than attempting to specify perfect human values upfront, the MAPS approach assumes misspecification is inevitable and designs systems to detect and contain it.

Evidence and Scope

The framework is presented as a conceptual response to the formal exponential barrier result. Gaikwad argues that because the exponential barrier is fundamental to single-evaluator feedback, alignment strategies must shift from elimination to management. The four levers map to different points in the training and deployment pipeline where misspecification can be detected or contained.

However, the framework remains conceptual rather than operational—it identifies levers but does not specify how to pull them in practice. The claim that the gap is "manageable" depends on whether organizations can implement these levers effectively, which remains unproven.

Relevant Notes:

adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans — MAPS is an adaptive governance approach to alignment
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — MAPS Shift lever directly addresses this problem
safe AI development requires building alignment mechanisms before scaling capability — MAPS provides a framework for those mechanisms

Topics:

domains/ai-alignment/_map

2.9 KiB Raw Blame History

Alignment gap is manageable not eliminable through MAPS framework

The Four Design Levers

Evidence and Scope

2.9 KiB

Raw Blame History