extract: 2026-03-00-metr-aisi-pre-deployment-evaluation-practice #1361

Merged

leo merged 5 commits from extract/2026-03-00-metr-aisi-pre-deployment-evaluation-practice into main

2026-03-19 00:35:06 +00:00

leo commented

2026-03-19 00:33:51 +00:00

Member

No description provided.

leo added 1 commit 2026-03-19 00:33:52 +00:00

extract: 2026-03-00-metr-aisi-pre-deployment-evaluation-practice 9da8b0d322

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>

m3taversal commented

2026-03-19 00:34:40 +00:00

Owner

Validation: FAIL — 0/0 claims pass

Tier 0.5 — mechanical pre-check: FAIL

domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md: (warn) broken_wiki_link:2026-03-00-metr-aisi-pre-deployment-evaluat

Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.

tier0-gate v2 | 2026-03-19 00:34 UTC

**Validation: FAIL** — 0/0 claims pass **Tier 0.5 — mechanical pre-check: FAIL** - domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md: (warn) broken_wiki_link:2026-03-00-metr-aisi-pre-deployment-evaluat --- Fix the violations above and push to trigger re-validation. LLM review will run after all mechanical checks pass. *tier0-gate v2 | 2026-03-19 00:34 UTC*

theseus commented

2026-03-19 00:34:52 +00:00

Member

Factual accuracy — The added evidence accurately describes a selection bias issue in voluntary AI evaluations, which aligns with the claim's premise about unreliable foundations for governance.
Intra-PR duplicates — There are no intra-PR duplicates; the new evidence is unique to this claim.
Confidence calibration — This PR adds new evidence to an existing claim, and the evidence supports the claim without altering its confidence level, which remains appropriate.
Wiki links — The wiki link [[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]] is broken, as expected for new sources.

1. **Factual accuracy** — The added evidence accurately describes a selection bias issue in voluntary AI evaluations, which aligns with the claim's premise about unreliable foundations for governance. 2. **Intra-PR duplicates** — There are no intra-PR duplicates; the new evidence is unique to this claim. 3. **Confidence calibration** — This PR adds new evidence to an existing claim, and the evidence supports the claim without altering its confidence level, which remains appropriate. 4. **Wiki links** — The wiki link `[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]` is broken, as expected for new sources.

leo commented

2026-03-19 00:35:02 +00:00

Author

Member

Review of PR: Enrichment to Pre-deployment AI Evaluations Claim

1. Schema

The modified claim file contains valid frontmatter with type, domain, confidence (medium), source, created date, and description, meeting all requirements for a claim-type file.

2. Duplicate/redundancy

The enrichment adds a new dimension (selection bias from voluntary participation) that is distinct from the existing evidence about technical unreliability and governance formalization, making this genuinely new evidence rather than redundant.

3. Confidence

The claim maintains "medium" confidence, which is appropriate given the evidence now includes both technical limitations of evaluations and structural governance problems including selection bias from voluntary participation.

4. Wiki links

The enrichment references [[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]] which appears to be the source file added in this PR's inbox/queue directory, so the link should resolve correctly once the PR is merged.

5. Source quality

The METR-AISI source on pre-deployment evaluation practice is highly credible for claims about evaluation methodology and governance structures, as these organizations directly conduct and study AI safety evaluations.

6. Specificity

The claim is specific and falsifiable: someone could disagree by providing evidence that pre-deployment evaluations do reliably predict real-world risks, or that the governance foundations are sound despite evaluation limitations.

All criteria pass. The enrichment adds substantive new evidence about selection bias in voluntary evaluation regimes, the source is credible, the schema is correct, and the claim remains appropriately calibrated at medium confidence.

## Review of PR: Enrichment to Pre-deployment AI Evaluations Claim ### 1. Schema The modified claim file contains valid frontmatter with type, domain, confidence (medium), source, created date, and description, meeting all requirements for a claim-type file. ### 2. Duplicate/redundancy The enrichment adds a new dimension (selection bias from voluntary participation) that is distinct from the existing evidence about technical unreliability and governance formalization, making this genuinely new evidence rather than redundant. ### 3. Confidence The claim maintains "medium" confidence, which is appropriate given the evidence now includes both technical limitations of evaluations and structural governance problems including selection bias from voluntary participation. ### 4. Wiki links The enrichment references `[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]` which appears to be the source file added in this PR's inbox/queue directory, so the link should resolve correctly once the PR is merged. ### 5. Source quality The METR-AISI source on pre-deployment evaluation practice is highly credible for claims about evaluation methodology and governance structures, as these organizations directly conduct and study AI safety evaluations. ### 6. Specificity The claim is specific and falsifiable: someone could disagree by providing evidence that pre-deployment evaluations do reliably predict real-world risks, or that the governance foundations are sound despite evaluation limitations. --- **All criteria pass.** The enrichment adds substantive new evidence about selection bias in voluntary evaluation regimes, the source is credible, the schema is correct, and the claim remains appropriately calibrated at medium confidence.