extract: 2025-08-00-mccaslin-stream-chembio-evaluation-reporting
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
parent
6be17a893b
commit
df155b7fff
4 changed files with 29 additions and 3 deletions
|
|
@ -33,6 +33,12 @@ The International AI Safety Report 2026 (multi-government committee, February 20
|
|||
|
||||
STREAM framework proposes standardized ChemBio evaluation reporting with 23-expert consensus on disclosure requirements. The focus on ChemBio as the initial domain for standardized dangerous capability reporting signals that this is recognized across government, civil society, academia, and frontier labs as the highest-priority risk domain requiring transparency infrastructure.
|
||||
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2025-08-00-mccaslin-stream-chembio-evaluation-reporting]] | Added: 2026-03-19*
|
||||
|
||||
STREAM's focus on ChemBio dangerous capability evaluation reporting (August 2025) reflects recognition that biological weapons capability is the priority domain for standardized evaluation disclosure, with 23 experts including government representatives developing reporting templates specifically for this risk
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -41,6 +41,12 @@ Expert consensus identifies 'external scrutiny, proactive evaluation and transpa
|
|||
|
||||
STREAM proposal identifies that current model reports lack 'sufficient detail to enable meaningful independent assessment' of dangerous capability evaluations. The need for a standardized reporting framework confirms that transparency problems extend beyond general disclosure (FMTI scores) to the specific domain of dangerous capability evaluation where external verification is currently impossible.
|
||||
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2025-08-00-mccaslin-stream-chembio-evaluation-reporting]] | Added: 2026-03-19*
|
||||
|
||||
STREAM proposal (August 2025) by 23-expert multi-stakeholder group identifies that current AI model reports lack sufficient detail about dangerous capability evaluations for third parties to assess rigor, confirming the transparency decline extends to evaluation disclosure not just organizational commitments
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -10,10 +10,12 @@
|
|||
"validation_stats": {
|
||||
"total": 1,
|
||||
"kept": 0,
|
||||
"fixed": 1,
|
||||
"fixed": 3,
|
||||
"rejected": 1,
|
||||
"fixes_applied": [
|
||||
"ai-model-reports-lack-standardized-dangerous-capability-disclosure-preventing-independent-assessment.md:set_created:2026-03-19"
|
||||
"ai-model-reports-lack-standardized-dangerous-capability-disclosure-preventing-independent-assessment.md:set_created:2026-03-19",
|
||||
"ai-model-reports-lack-standardized-dangerous-capability-disclosure-preventing-independent-assessment.md:stripped_wiki_link:AI transparency is declining not improving because Stanford ",
|
||||
"ai-model-reports-lack-standardized-dangerous-capability-disclosure-preventing-independent-assessment.md:stripped_wiki_link:AI lowers the expertise barrier for engineering biological w"
|
||||
],
|
||||
"rejections": [
|
||||
"ai-model-reports-lack-standardized-dangerous-capability-disclosure-preventing-independent-assessment.md:missing_attribution_extractor"
|
||||
|
|
|
|||
|
|
@ -7,13 +7,17 @@ date: 2025-08-01
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: paper
|
||||
status: unprocessed
|
||||
status: enrichment
|
||||
priority: medium
|
||||
tags: [evaluation-infrastructure, dangerous-capabilities, standardized-reporting, ChemBio, transparency, STREAM]
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-19
|
||||
enrichments_applied: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-19
|
||||
enrichments_applied: ["AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -65,3 +69,11 @@ EXTRACTION HINT: Focus on the disclosure gap: labs currently report their own da
|
|||
- STREAM includes 3-page reporting template and gold standard examples
|
||||
- Initial STREAM focus is chemical and biological (ChemBio) dangerous capability evaluations
|
||||
- STREAM has two stated purposes: practical guidance for AI developers and enabling third-party assessment of evaluation rigor
|
||||
|
||||
|
||||
## Key Facts
|
||||
- STREAM (Standard for Transparently Reporting Evaluations in AI Model Reports) proposed August 2025
|
||||
- STREAM developed by 23 experts from government, civil society, academia, and frontier AI companies
|
||||
- STREAM includes 3-page reporting template and gold standard examples
|
||||
- STREAM initial focus is chemical and biological (ChemBio) dangerous capability evaluations
|
||||
- STREAM has two stated purposes: practical guidance for AI developers and enabling third-party assessment of evaluation rigor
|
||||
|
|
|
|||
Loading…
Reference in a new issue