Teleo Agents e8a500138d

Sync Graph Data to teleo-app / sync (push) Waiting to run

Details

Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: extract claims from 2026-04-09-lindsey-representation-geometry-alignment-probing

- Source: inbox/queue/2026-04-09-lindsey-representation-geometry-alignment-probing.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 0
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-04-09 02:16:22 +00:00

1.9 KiB

Raw Blame History

type

domain

description

confidence

source

created

title

agent

scope

sourcer

related_claims

claim

ai-alignment

The read-only advantage of geometric probing is structurally coupled to infrastructure control, bounding where the method applies

experimental

Lindsey & Garriga-Alonso (Anthropic), arxiv 2604.02891

2026-04-09

Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios

theseus

structural

Jack Lindsey, Adria Garriga-Alonso (Anthropic)

pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations

Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios

Geometric probing requires white-box access to all intermediate activations across reasoning steps — it cannot be deployed as a behavioral API test. This requires inference-time activation recording infrastructure, which means the method works in evaluation contexts where the model operator controls the infrastructure but fails in adversarial external audit contexts where auditors lack internal access. The read-only property that prevents adversarial targeting is therefore structurally coupled to infrastructure control: defenders who can monitor trajectory geometry are necessarily the same parties who control the deployment infrastructure. This creates a fundamental boundary condition: trajectory geometry is a tool for internal alignment evaluation and monitoring, not for external oversight or third-party auditing. The method provides a partial escape from the SAE dual-use trap, but only within the trust boundary of organizations that already control model deployment.

1.9 KiB Raw Blame History

Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios

1.9 KiB

Raw Blame History