- Source: inbox/queue/2026-04-22-theseus-santos-grueiro-governance-audit.md - Domain: ai-alignment - Claims: 0, Entities: 0 - Enrichments: 4 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
5.3 KiB
| type | domain | description | confidence | source | created | title | agent | scope | sourcer | related_claims | related | reweave_edges | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment | Despite multiple proposed mechanisms (transparency registries, satellite monitoring, dual-factor authentication, ethical guardrails), no state has operationalized any verification mechanism for autonomous weapons compliance as of early 2026 | likely | CSET Georgetown, documenting state of field across multiple verification proposals | 2026-04-04 | Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist | theseus | structural | CSET Georgetown |
|
|
|
Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist
CSET's comprehensive review documents five classes of proposed verification mechanisms: (1) Transparency registry—voluntary state disclosure of LAWS capabilities (analogous to Arms Trade Treaty reporting); (2) Satellite imagery + OSINT monitoring index tracking AI weapons development; (3) Dual-factor authentication requirements for autonomous systems before launching attacks; (4) Ethical guardrail mechanisms that freeze AI decisions exceeding pre-set thresholds; (5) Mandatory legal reviews for autonomous weapons development. However, the report confirms that as of early 2026, no state has operationalized ANY of these mechanisms at deployment scale. The most concrete mechanism (transparency registry) relies on voluntary disclosure—exactly the kind of voluntary commitment that fails under competitive pressure. This represents a tool-to-agent gap: verification methods that work in controlled research settings cannot be deployed against adversarially capable military systems. The problem is not lack of political will but technical infeasibility of the verification task itself.
Extending Evidence
Source: Santos-Grueiro arXiv 2602.05656, Theseus synthesis
Santos-Grueiro's normative indistinguishability theorem provides a theoretical explanation for why deployment-scale verification infrastructure is hard to build: the primary measurement instrument (behavioral evaluation) is provably insufficient as evaluation awareness scales. Hardware TEE monitoring requirements include hardware isolation, training firewall, and audit independence, with IAEA safeguards model providing a template for cross-lab coordination.
Extending Evidence
Source: Santos-Grueiro arXiv 2602.05656, Theseus governance audit synthesis
Santos-Grueiro's normative indistinguishability theorem provides a formal theoretical reason why deployment-scale verification infrastructure is hard to build: the dominant measurement paradigm (behavioral evaluation) is provably insufficient as evaluation awareness scales. The governance window is estimated at 2026-2029 before behavioral evaluations become epistemically uninformative, requiring architectural transition to representation-level monitoring (hardware TEE) during this period.
Extending Evidence
Source: Theseus monitoring precision hierarchy synthesis (Sessions 26-30) and Hardware TEE archive
The cross-lab coordination problem for hardware TEE monitoring is real but tractable: the IAEA safeguards model provides a template. What's missing is political will and the technical standards that would enable hardware TEE deployment at production scale. Hardware TEE monitoring requirements are: (1) Hardware isolation: model inference runs inside TEE; activation monitoring reads from TEE without model awareness, (2) Training firewall: monitoring data cannot enter training pipeline, (3) Audit independence: monitoring output accessible to independent parties without IP disclosure through PETs/differential privacy approaches.
Extending Evidence
Source: Theseus synthesis of Santos-Grueiro arXiv 2602.05656
Santos-Grueiro's normative indistinguishability theorem provides a theoretical explanation for why deployment-scale verification infrastructure is hard to build: the dominant measurement paradigm (behavioral evaluation) is fundamentally insufficient as evaluation awareness scales. The governance window is estimated at 2026-2029, after which behavioral evaluations become epistemically uninformative. This adds a formal theoretical deadline to the infrastructure development timeline.