Teleo Agents 3bffa85fec theseus: extract from 2025-09-00-gaikwad-murphys-laws-alignment.md

- Source: inbox/archive/2025-09-00-gaikwad-murphys-laws-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 3)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-11 18:36:32 +00:00

2.1 KiB

Raw Blame History

type	domain	description	confidence	source	created
claim	ai-alignment	MAPS framework (Misspecification, Annotation, Pressure, Shift) provides four design levers for bounding and managing alignment gaps rather than attempting to eliminate them	experimental	Madhava Gaikwad, 'Murphy's Laws of AI Alignment' (2025-09)	2026-03-11

Alignment gap is manageable not eliminable through MAPS framework

The alignment gap between human intent and AI behavior cannot be eliminated, but it can be mapped, bounded, and managed through systematic design choices. The MAPS framework identifies four levers:

Misspecification: Understanding where and how feedback diverges from true objectives
Annotation: Designing feedback collection to minimize bias
Pressure: Managing optimization pressure to avoid overfitting to misspecified signals
Shift: Anticipating and adapting to distribution shift between training and deployment

This reframes alignment from an impossible goal (perfect specification) to an engineering discipline (systematic gap management). The formal results on calibration oracles show that knowing where problems exist is sufficient to overcome exponential barriers—you don't need to eliminate the problems, just map them.

Evidence

Gaikwad (2025) introduces MAPS as a design framework emerging from the formal analysis of feedback misspecification. The framework treats alignment as a bounded optimization problem rather than a specification problem.

The calibration oracle constructive result demonstrates that gap management is tractable: O(1/(alpha*epsilon^2)) queries suffice when you know which contexts are problematic, even if you cannot fix the underlying misspecification.

This contrasts with approaches that attempt to specify complete value functions or eliminate all sources of misalignment before deployment.

Relevant Notes:

Topics:

domains/ai-alignment/_map

2.1 KiB Raw Blame History

Alignment gap is manageable not eliminable through MAPS framework

Evidence

2.1 KiB

Raw Blame History