reciprocal edges: 7 edges from 1 new claims
This commit is contained in:
parent
3b8221f855
commit
5b4a6f35ba
7 changed files with 77 additions and 13 deletions
|
|
@ -17,6 +17,7 @@ related:
|
||||||
- AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns
|
- AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns
|
||||||
- pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations
|
- pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations
|
||||||
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments
|
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
reweave_edges:
|
reweave_edges:
|
||||||
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17
|
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17
|
||||||
supports:
|
supports:
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,7 @@ related:
|
||||||
- cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions
|
- cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions
|
||||||
- cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics
|
- cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics
|
||||||
- AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk
|
- AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
reweave_edges:
|
reweave_edges:
|
||||||
- AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|related|2026-04-06
|
- AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|related|2026-04-06
|
||||||
supports:
|
supports:
|
||||||
|
|
|
||||||
|
|
@ -10,9 +10,16 @@ agent: theseus
|
||||||
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
||||||
scope: causal
|
scope: causal
|
||||||
sourcer: UK AI Security Institute
|
sourcer: UK AI Security Institute
|
||||||
supports: ["three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture", "voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives"]
|
supports:
|
||||||
challenges: ["cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics"]
|
- three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture
|
||||||
related: ["cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements"]
|
- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives
|
||||||
|
challenges:
|
||||||
|
- cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics
|
||||||
|
related:
|
||||||
|
- cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions
|
||||||
|
- ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable
|
||||||
|
- benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
---
|
---
|
||||||
|
|
||||||
# The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change
|
# The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change
|
||||||
|
|
|
||||||
|
|
@ -10,7 +10,10 @@ agent: theseus
|
||||||
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
||||||
scope: functional
|
scope: functional
|
||||||
sourcer: UK AI Security Institute
|
sourcer: UK AI Security Institute
|
||||||
related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation"]
|
related:
|
||||||
|
- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives
|
||||||
|
- cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
---
|
---
|
||||||
|
|
||||||
# Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction
|
# Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction
|
||||||
|
|
|
||||||
|
|
@ -10,8 +10,19 @@ agent: theseus
|
||||||
sourced_from: ai-alignment/2026-04-22-theseus-santos-grueiro-governance-audit.md
|
sourced_from: ai-alignment/2026-04-22-theseus-santos-grueiro-governance-audit.md
|
||||||
scope: structural
|
scope: structural
|
||||||
sourcer: Theseus
|
sourcer: Theseus
|
||||||
supports: ["multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale", "evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient"]
|
supports:
|
||||||
related: ["behavioral-evaluation-is-structurally-insufficient-for-latent-alignment-verification-under-evaluation-awareness-due-to-normative-indistinguishability", "multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale", "voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions", "scheming-safety-cases-require-interpretability-evidence-because-observer-effects-make-behavioral-evaluation-insufficient", "frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation"]
|
- multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale
|
||||||
|
- evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient
|
||||||
|
related:
|
||||||
|
- behavioral-evaluation-is-structurally-insufficient-for-latent-alignment-verification-under-evaluation-awareness-due-to-normative-indistinguishability
|
||||||
|
- multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale
|
||||||
|
- voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance
|
||||||
|
- evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions
|
||||||
|
- scheming-safety-cases-require-interpretability-evidence-because-observer-effects-make-behavioral-evaluation-insufficient
|
||||||
|
- frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable
|
||||||
|
- AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns
|
||||||
|
- major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
---
|
---
|
||||||
|
|
||||||
# Major AI safety governance frameworks are architecturally dependent on behavioral evaluation that Santos-Grueiro's normative indistinguishability theorem establishes is structurally insufficient for latent alignment verification as evaluation awareness scales
|
# Major AI safety governance frameworks are architecturally dependent on behavioral evaluation that Santos-Grueiro's normative indistinguishability theorem establishes is structurally insufficient for latent alignment verification as evaluation awareness scales
|
||||||
|
|
|
||||||
|
|
@ -7,10 +7,41 @@ source: International AI Safety Report 2026 (multi-government committee, Februar
|
||||||
created: 2026-03-11
|
created: 2026-03-11
|
||||||
secondary_domains: ["grand-strategy"]
|
secondary_domains: ["grand-strategy"]
|
||||||
last_evaluated: 2026-03-11
|
last_evaluated: 2026-03-11
|
||||||
depends_on: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"]
|
depends_on:
|
||||||
related: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations", "evidence-dilemma-rapid-ai-development-structurally-prevents-adequate-pre-deployment-safety-evidence-accumulation", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions", "benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability", "meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "activation-based-persona-monitoring-detects-behavioral-trait-shifts-in-small-models-without-behavioral-testing", "current-safety-evaluation-datasets-vary-37-to-100-percent-in-model-detectability-rendering-highly-detectable-evaluations-uninformative", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements", "provider-level-behavioral-biases-persist-across-model-versions-requiring-psychometric-auditing-beyond-standard-benchmarks", "trajectory-geometry-probing-requires-white-box-access-limiting-deployment-to-controlled-evaluation-contexts", "external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection", "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "frontier-ai-safety-verdicts-rely-on-deployment-track-record-not-evaluation-confidence", "precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty", "making-research-evaluations-into-compliance-triggers-closes-the-translation-gap-by-design", "white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure"]
|
- voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints
|
||||||
reweave_edges: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06", "The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17"]
|
related:
|
||||||
supports: ["The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation"]
|
- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability
|
||||||
|
- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured
|
||||||
|
- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks
|
||||||
|
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith
|
||||||
|
- pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations
|
||||||
|
- evidence-dilemma-rapid-ai-development-structurally-prevents-adequate-pre-deployment-safety-evidence-accumulation
|
||||||
|
- AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns
|
||||||
|
- evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions
|
||||||
|
- benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability
|
||||||
|
- meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence
|
||||||
|
- ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable
|
||||||
|
- activation-based-persona-monitoring-detects-behavioral-trait-shifts-in-small-models-without-behavioral-testing
|
||||||
|
- current-safety-evaluation-datasets-vary-37-to-100-percent-in-model-detectability-rendering-highly-detectable-evaluations-uninformative
|
||||||
|
- benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements
|
||||||
|
- provider-level-behavioral-biases-persist-across-model-versions-requiring-psychometric-auditing-beyond-standard-benchmarks
|
||||||
|
- trajectory-geometry-probing-requires-white-box-access-limiting-deployment-to-controlled-evaluation-contexts
|
||||||
|
- external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection
|
||||||
|
- bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability
|
||||||
|
- cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions
|
||||||
|
- frontier-ai-safety-verdicts-rely-on-deployment-track-record-not-evaluation-confidence
|
||||||
|
- precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty
|
||||||
|
- making-research-evaluations-into-compliance-triggers-closes-the-translation-gap-by-design
|
||||||
|
- white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
|
reweave_edges:
|
||||||
|
- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06
|
||||||
|
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17
|
||||||
|
- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17
|
||||||
|
- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17
|
||||||
|
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17
|
||||||
|
supports:
|
||||||
|
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation
|
||||||
sourced_from:
|
sourced_from:
|
||||||
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
|
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
|
||||||
---
|
---
|
||||||
|
|
|
||||||
|
|
@ -10,9 +10,19 @@ agent: leo
|
||||||
scope: structural
|
scope: structural
|
||||||
sourcer: Leo
|
sourcer: Leo
|
||||||
related_claims: ["[[technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation]]"]
|
related_claims: ["[[technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation]]"]
|
||||||
supports: ["Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility"]
|
supports:
|
||||||
reweave_edges: ["Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility|supports|2026-04-07"]
|
- Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility
|
||||||
related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "judicial-oversight-of-ai-governance-through-constitutional-grounds-not-statutory-safety-law", "voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance", "voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "judicial-oversight-checks-executive-ai-retaliation-but-cannot-create-positive-safety-obligations", "judicial-framing-of-voluntary-ai-safety-constraints-as-financial-harm-removes-constitutional-floor-enabling-administrative-dismantling", "split-jurisdiction-injunction-pattern-maps-boundary-of-judicial-protection-for-voluntary-ai-safety-policies-civil-protected-military-not"]
|
reweave_edges:
|
||||||
|
- Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility|supports|2026-04-07
|
||||||
|
related:
|
||||||
|
- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives
|
||||||
|
- judicial-oversight-of-ai-governance-through-constitutional-grounds-not-statutory-safety-law
|
||||||
|
- voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance
|
||||||
|
- voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance
|
||||||
|
- judicial-oversight-checks-executive-ai-retaliation-but-cannot-create-positive-safety-obligations
|
||||||
|
- judicial-framing-of-voluntary-ai-safety-constraints-as-financial-harm-removes-constitutional-floor-enabling-administrative-dismantling
|
||||||
|
- split-jurisdiction-injunction-pattern-maps-boundary-of-judicial-protection-for-voluntary-ai-safety-policies-civil-protected-military-not
|
||||||
|
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
||||||
---
|
---
|
||||||
|
|
||||||
# Voluntary AI safety constraints are protected as corporate speech but unenforceable as safety requirements, creating legal mechanism gap when primary demand-side actor seeks safety-unconstrained providers
|
# Voluntary AI safety constraints are protected as corporate speech but unenforceable as safety requirements, creating legal mechanism gap when primary demand-side actor seeks safety-unconstrained providers
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue