extract: 2024-12-00-uuk-mitigations-gpai-systemic-risks-76-experts
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
parent
fa65d8ca3c
commit
948389d92e
6 changed files with 46 additions and 5 deletions
|
|
@ -47,6 +47,12 @@ STREAM proposal identifies that current model reports lack 'sufficient detail to
|
|||
|
||||
Stanford FMTI 2024→2025 data: mean transparency score declined 17 points. Meta -29 points, Mistral -37 points, OpenAI -14 points. OpenAI removed 'safely' from mission statement (Nov 2025), dissolved Superalignment team (May 2024) and Mission Alignment team (Feb 2026). Google accused by 60 UK lawmakers of violating Seoul commitments with Gemini 2.5 Pro (Apr 2025).
|
||||
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2024-12-00-uuk-mitigations-gpai-systemic-risks-76-experts]] | Added: 2026-03-19*
|
||||
|
||||
Expert consensus identifies 'external scrutiny, proactive evaluation and transparency' as the guiding principles for effective AI systemic risk mitigation, with third-party audits as top-3 priority. The fact that transparency appears as a core principle in expert consensus while Stanford FMTI scores declined 17 points shows the gap is widening: experts know what's needed, but practice is moving in the opposite direction.
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -48,6 +48,12 @@ The EU AI Act's enforcement mechanisms (penalties up to €35 million or 7% of g
|
|||
|
||||
Third-party pre-deployment audits are the top expert consensus priority (>60% agreement across AI safety, CBRN, critical infrastructure, democratic processes, and discrimination domains), yet no major lab implements them. This is the strongest available evidence that voluntary commitments cannot deliver what safety requires—the entire expert community agrees on the priority, and it still doesn't happen.
|
||||
|
||||
|
||||
### Additional Evidence (confirm)
|
||||
*Source: [[2024-12-00-uuk-mitigations-gpai-systemic-risks-76-experts]] | Added: 2026-03-19*
|
||||
|
||||
Third-party pre-deployment audits achieved >60% expert agreement across five risk domains as top-3 priority mitigation, yet remain absent from major AI lab practices. This provides another data point confirming that even measures with strong expert consensus backing do not get implemented voluntarily, supporting the claim that only binding regulation with enforcement changes behavior.
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -11,6 +11,12 @@ source: "AI Safety Grant Application (LivingIP)"
|
|||
|
||||
Expert consensus from 76 specialists across 5 risk domains defines what 'building alignment mechanisms' should include: third-party pre-deployment audits, safety incident reporting with information sharing, and pre-deployment risk assessments are the top-3 priorities with >60% cross-domain agreement. The convergence of biosecurity experts, AI safety researchers, critical infrastructure specialists, democracy defenders, and discrimination researchers on the same top-3 list provides empirical specification of which mechanisms matter most.
|
||||
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2024-12-00-uuk-mitigations-gpai-systemic-risks-76-experts]] | Added: 2026-03-19*
|
||||
|
||||
76 cross-domain experts (AI safety, critical infrastructure, CBRN, democratic processes, discrimination/bias) converged on third-party pre-deployment audits as top-3 priority mitigation with >60% agreement across all domains. This defines what 'building alignment mechanisms' concretely means: external scrutiny, proactive evaluation, and transparency as guiding principles. The expert consensus provides a specific, empirically-grounded definition of the evaluation infrastructure that should exist before scaling.
|
||||
|
||||
---
|
||||
|
||||
# safe AI development requires building alignment mechanisms before scaling capability
|
||||
|
|
|
|||
|
|
@ -45,6 +45,12 @@ The gap between expert consensus (76 specialists identify third-party audits as
|
|||
|
||||
Comprehensive evidence across governance mechanisms: ALL international declarations (Bletchley, Seoul, Paris, Hiroshima, OECD, UN) produced zero verified behavioral change. Frontier Model Forum produced no binding commitments. White House voluntary commitments eroded. 450+ organizations lobbied on AI in 2025 ($92M in fees), California SB 1047 vetoed after industry pressure. Only binding regulation (EU AI Act, China enforcement, US export controls) changed behavior.
|
||||
|
||||
|
||||
### Additional Evidence (confirm)
|
||||
*Source: [[2024-12-00-uuk-mitigations-gpai-systemic-risks-76-experts]] | Added: 2026-03-19*
|
||||
|
||||
Expert consensus identifies third-party pre-deployment audits as top-3 priority, yet no mandatory requirement exists at major labs. The gap between expert-identified necessity and actual implementation confirms that voluntary adoption of even consensus-backed safety measures fails under competitive pressure. If 76 cross-domain experts agree on the priority and it still doesn't happen, voluntary mechanisms are insufficient.
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "expert-consensus-identifies-third-party-audits-as-top-priority-but-no-mandatory-implementation-exists.md",
|
||||
"filename": "expert-consensus-identifies-third-party-pre-deployment-audits-as-top-priority-for-AI-systemic-risk-mitigation.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
|
|
@ -10,13 +10,16 @@
|
|||
"validation_stats": {
|
||||
"total": 1,
|
||||
"kept": 0,
|
||||
"fixed": 1,
|
||||
"fixed": 4,
|
||||
"rejected": 1,
|
||||
"fixes_applied": [
|
||||
"expert-consensus-identifies-third-party-audits-as-top-priority-but-no-mandatory-implementation-exists.md:set_created:2026-03-19"
|
||||
"expert-consensus-identifies-third-party-pre-deployment-audits-as-top-priority-for-AI-systemic-risk-mitigation.md:set_created:2026-03-19",
|
||||
"expert-consensus-identifies-third-party-pre-deployment-audits-as-top-priority-for-AI-systemic-risk-mitigation.md:stripped_wiki_link:safe AI development requires building alignment mechanisms b",
|
||||
"expert-consensus-identifies-third-party-pre-deployment-audits-as-top-priority-for-AI-systemic-risk-mitigation.md:stripped_wiki_link:voluntary safety pledges cannot survive competitive pressure",
|
||||
"expert-consensus-identifies-third-party-pre-deployment-audits-as-top-priority-for-AI-systemic-risk-mitigation.md:stripped_wiki_link:only binding regulation with enforcement teeth changes front"
|
||||
],
|
||||
"rejections": [
|
||||
"expert-consensus-identifies-third-party-audits-as-top-priority-but-no-mandatory-implementation-exists.md:missing_attribution_extractor"
|
||||
"expert-consensus-identifies-third-party-pre-deployment-audits-as-top-priority-for-AI-systemic-risk-mitigation.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
|
|
|
|||
|
|
@ -7,13 +7,17 @@ date: 2024-12-01
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: paper
|
||||
status: unprocessed
|
||||
status: enrichment
|
||||
priority: high
|
||||
tags: [evaluation-infrastructure, third-party-audit, expert-consensus, systemic-risk, mitigation-prioritization]
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-19
|
||||
enrichments_applied: ["safe AI development requires building alignment mechanisms before scaling capability.md", "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-19
|
||||
enrichments_applied: ["safe AI development requires building alignment mechanisms before scaling capability.md", "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -63,3 +67,13 @@ EXTRACTION HINT: Focus on the top-3 mitigation list and the "external scrutiny,
|
|||
- Top-3 mitigations had >60% agreement across all risk domains
|
||||
- Top-3 mitigations appeared in >40% of experts' preferred combinations
|
||||
- Paper is 78 pages and published December 2024
|
||||
|
||||
|
||||
## Key Facts
|
||||
- Survey included 76 specialists across five domains: AI safety, critical infrastructure, democratic processes, CBRN risks, and discrimination/bias
|
||||
- 27 mitigation measures were evaluated through literature review
|
||||
- Top-3 mitigations had >60% agreement across all risk domains
|
||||
- Top-3 mitigations appeared in >40% of experts' preferred combinations
|
||||
- Top three mitigations: (1) safety incident reports and security information sharing, (2) third-party pre-deployment model audits, (3) pre-deployment risk assessments
|
||||
- Guiding principles identified: 'External scrutiny, proactive evaluation and transparency'
|
||||
- Paper published December 2024, 78 pages
|
||||
|
|
|
|||
Loading…
Reference in a new issue