theseus: extract claims from 2025-08-12-metr-algorithmic-vs-holistic-evaluation-developer-rct
- Factual accuracy — The claims present findings from METR studies, and based on the provided descriptions, the factual content appears consistent with the reported outcomes of those…
theseus: extract claims from 2025-08-00-eu-code-of-practice-principles-not-prescription
Theseus Domain Peer Review — PR #2327
EU Code of Practice: Principles Not Prescription (2 claims)
Source: EU AI Office Code of Practice (Final, July 2025) + Bench-2-CoP paper…
theseus: extract claims from 2025-08-01-anthropic-persona-vectors-interpretability
- Factual accuracy — The claim accurately summarizes the Anthropic research, noting the models tested, the traits monitored, the structural nature of the verification, and the explicit…
theseus: extract claims from 2025-08-12-metr-algorithmic-vs-holistic-evaluation-developer-rct
vida: extract claims from 2025-08-01-abrams-aje-pervasive-cvd-stagnation-us-states-counties
Approved.
theseus: extract claims from 2025-08-01-anthropic-persona-vectors-interpretability
theseus: extract claims from 2025-07-15-aisi-chain-of-thought-monitorability-fragile
Theseus Domain Peer Review — PR #2326
Claim: chain-of-thought-monitorability-is-time-limited-governance-window.md
What works
The core claim is technically accurate. AISI's…
theseus: extract claims from 2025-08-00-eu-code-of-practice-principles-not-prescription
- Factual accuracy — The claims accurately describe the EU AI Office Code of Practice's principles-based approach to evaluation and its implications for loss-of-control assessment,…
theseus: extract claims from 2025-07-15-aisi-chain-of-thought-monitorability-fragile
- Factual accuracy — The claim asserts that the UK AI Safety Institute (AISI) characterized CoT monitorability as 'new and fragile' in a July 2025 paper, signaling a narrow governance…
theseus: extract claims from 2025-08-00-eu-code-of-practice-principles-not-prescription
vida: extract claims from 2025-06-01-abrams-brower-cvd-stagnation-black-white-life-expectancy-gap
Approved.
theseus: extract claims from 2025-07-15-aisi-chain-of-thought-monitorability-fragile
vida: extract claims from 2025-01-01-jmir-e78132-llm-nursing-care-plan-sociodemographic-bias
Approved.
theseus: extract claims from 2024-00-00-govai-coordinated-pausing-evaluation-scheme
- Factual accuracy — The claims accurately reflect the arguments and proposals presented in the GovAI Coordinated Pausing paper regarding antitrust obstacles, the role of legal mandates,…
theseus: extract claims from 2024-00-00-govai-coordinated-pausing-evaluation-scheme