extract: 2025-08-00-mccaslin-stream-chembio-evaluation-reporting

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
Teleo Agents 2026-03-19 00:31:46 +00:00
parent 2a9f39a6f6
commit a816020995
4 changed files with 50 additions and 1 deletions

View file

@ -27,6 +27,12 @@ The structural point is about threat proximity. AI takeover requires autonomy, r
The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that 'biological/chemical weapons information accessible through AI systems' is a documented malicious use risk. While the report does not specify the expertise level required (PhD vs amateur), it categorizes bio/chem weapons information access alongside AI-generated persuasion and cyberattack capabilities as confirmed malicious use risks, giving institutional multi-government validation to the bioterrorism concern. The International AI Safety Report 2026 (multi-government committee, February 2026) confirms that 'biological/chemical weapons information accessible through AI systems' is a documented malicious use risk. While the report does not specify the expertise level required (PhD vs amateur), it categorizes bio/chem weapons information access alongside AI-generated persuasion and cyberattack capabilities as confirmed malicious use risks, giving institutional multi-government validation to the bioterrorism concern.
### Additional Evidence (extend)
*Source: [[2025-08-00-mccaslin-stream-chembio-evaluation-reporting]] | Added: 2026-03-19*
STREAM framework specifically targets ChemBio dangerous capability evaluations as its initial focus domain, with a 23-expert multi-stakeholder process (including government representatives) developing standardized reporting templates. This suggests ChemBio capabilities are recognized as the priority dangerous capability domain requiring transparent evaluation disclosure.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -29,6 +29,12 @@ This evidence directly challenges the theory that governance pressure (declarati
The alignment implication: transparency is a prerequisite for external oversight. If [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]], declining transparency makes even the unreliable evaluations harder to conduct. The governance mechanisms that could provide oversight (safety institutes, third-party auditors) depend on lab cooperation that is actively eroding. The alignment implication: transparency is a prerequisite for external oversight. If [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]], declining transparency makes even the unreliable evaluations harder to conduct. The governance mechanisms that could provide oversight (safety institutes, third-party auditors) depend on lab cooperation that is actively eroding.
### Additional Evidence (extend)
*Source: [[2025-08-00-mccaslin-stream-chembio-evaluation-reporting]] | Added: 2026-03-19*
The need for STREAM (a standardized dangerous capability evaluation reporting framework) demonstrates that even when labs conduct evaluations, the lack of disclosure standards prevents external assessment. The 23-expert group explicitly states current model reports lack 'sufficient detail' for third parties to assess evaluation rigor, suggesting transparency problems extend beyond what FMTI scores capture.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -0,0 +1,24 @@
{
"rejected_claims": [
{
"filename": "ai-model-reports-lack-standardized-dangerous-capability-evaluation-disclosure-preventing-independent-assessment.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 1,
"kept": 0,
"fixed": 1,
"rejected": 1,
"fixes_applied": [
"ai-model-reports-lack-standardized-dangerous-capability-evaluation-disclosure-preventing-independent-assessment.md:set_created:2026-03-19"
],
"rejections": [
"ai-model-reports-lack-standardized-dangerous-capability-evaluation-disclosure-preventing-independent-assessment.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-19"
}

View file

@ -7,9 +7,13 @@ date: 2025-08-01
domain: ai-alignment domain: ai-alignment
secondary_domains: [] secondary_domains: []
format: paper format: paper
status: unprocessed status: enrichment
priority: medium priority: medium
tags: [evaluation-infrastructure, dangerous-capabilities, standardized-reporting, ChemBio, transparency, STREAM] tags: [evaluation-infrastructure, dangerous-capabilities, standardized-reporting, ChemBio, transparency, STREAM]
processed_by: theseus
processed_date: 2026-03-19
enrichments_applied: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
--- ---
## Content ## Content
@ -53,3 +57,12 @@ PRIMARY CONNECTION: [[AI lowers the expertise barrier for engineering biological
WHY ARCHIVED: Provides evidence of emerging standardization for dangerous capability evaluation reporting. The multi-stakeholder process (government, academia, AI companies) signals potential for eventual adoption. WHY ARCHIVED: Provides evidence of emerging standardization for dangerous capability evaluation reporting. The multi-stakeholder process (government, academia, AI companies) signals potential for eventual adoption.
EXTRACTION HINT: Focus on the disclosure gap: labs currently report their own dangerous capability evaluations without standardized format, preventing independent assessment of rigor. EXTRACTION HINT: Focus on the disclosure gap: labs currently report their own dangerous capability evaluations without standardized format, preventing independent assessment of rigor.
## Key Facts
- STREAM stands for 'Standard for Transparently Reporting Evaluations in AI Model Reports'
- STREAM was developed by 23 experts from government, civil society, academia, and frontier AI companies
- STREAM's initial focus is on chemical and biological (ChemBio) dangerous capability evaluations
- STREAM includes a 3-page reporting template and 'gold standard' examples
- STREAM was proposed in August 2025
- No evidence of adoption by major labs in current model reports as of publication