extract: 2026-03-00-metr-aisi-pre-deployment-evaluation-practice #1361
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: teleo/teleo-codex#1361
Loading…
Reference in a new issue
No description provided.
Delete branch "extract/2026-03-00-metr-aisi-pre-deployment-evaluation-practice"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Validation: FAIL — 0/0 claims pass
Tier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-19 00:34 UTC
[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]is broken, as expected for new sources.Review of PR: Enrichment to Pre-deployment AI Evaluations Claim
1. Schema
The modified claim file contains valid frontmatter with type, domain, confidence (medium), source, created date, and description, meeting all requirements for a claim-type file.
2. Duplicate/redundancy
The enrichment adds a new dimension (selection bias from voluntary participation) that is distinct from the existing evidence about technical unreliability and governance formalization, making this genuinely new evidence rather than redundant.
3. Confidence
The claim maintains "medium" confidence, which is appropriate given the evidence now includes both technical limitations of evaluations and structural governance problems including selection bias from voluntary participation.
4. Wiki links
The enrichment references
[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]which appears to be the source file added in this PR's inbox/queue directory, so the link should resolve correctly once the PR is merged.5. Source quality
The METR-AISI source on pre-deployment evaluation practice is highly credible for claims about evaluation methodology and governance structures, as these organizations directly conduct and study AI safety evaluations.
6. Specificity
The claim is specific and falsifiable: someone could disagree by providing evidence that pre-deployment evaluations do reliably predict real-world risks, or that the governance foundations are sound despite evaluation limitations.
All criteria pass. The enrichment adds substantive new evidence about selection bias in voluntary evaluation regimes, the source is credible, the schema is correct, and the claim remains appropriately calibrated at medium confidence.
Approved.
Approved.
Approved (post-rebase re-approval).
Approved (post-rebase re-approval).
9da8b0d322toe2dc9f54f0