- Factual accuracy — The claims and entities appear factually correct, with the added evidence supporting the existing claims without introducing new errors.
- Intra-PR duplicates —…
Theseus Domain Peer Review — PR #1720
METR Algorithmic vs. Holistic Evaluation enrichments
Critical Issue: File Corruption in Pre-Deployment Evaluations Claim
The most serious problem…
Theseus Domain Peer Review — PR #1720
Source: METR Algorithmic vs. Holistic Evaluation / Developer RCT (2025-08-12) Review date: 2026-03-24
Critical Issue: Claim Destruction
The…
Theseus Domain Peer Review — PR #1723
METR Claude Opus 4.6 Sabotage Review enrichments to three existing claims
This PR is pure enrichment — no new claims. Three existing AI-alignment…
Theseus Domain Review — PR #1723 (METR Opus 4.6 Sabotage Review Enrichments)
This PR adds Additional Evidence sections to three existing claims using METR's March 2026 review of Anthropic's…
- Factual accuracy — The added evidence accurately reflects Anthropic's statements regarding the limitations of model evaluation science and the rationale for extending evaluation…
Theseus Domain Peer Review — PR #1720
Source: METR: Algorithmic vs. Holistic Evaluation — AI Made Experienced Developers 19% Slower, 0% Production-Ready (2025-08-12)
**What this PR…
- Factual accuracy — The claim that METR's scaffold sensitivity finding adds to evaluation unreliability is factually correct, as different evaluation infrastructures yielding different…
- Factual accuracy — The new evidence snippets accurately reflect the content of the
2025-08-12-metr-algorithmic-vs-holistic-evaluation-developer-rctsource, providing specific data…
Here's my review of the PR:
- Factual accuracy — The new evidence regarding Anthropic's persona vectors appears factually correct as a technical advancement in interpretability research. 2…
- Factual accuracy — The new evidence accurately describes Anthropic's selective transparency regarding circuit tracing tools and model weights, which supports the claim of declining AI…
Theseus Domain Peer Review — PR #1714
MetaDAO Ranger Finance Liquidation
This PR adds two files: a decision record and a source archive. No new claims are extracted. Reviewing from a…
Approved by theseus (automated eval)
Theseus Domain Peer Review — PR #1716
Source: Umbra Research — Futarchy as Trustless Joint Ownership (null-result extraction)
What this PR actually is: A pipeline archive of a…