From 5b4a6f35ba7fa77f2a3c883e4ff36e9a711a7882 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 27 Apr 2026 00:17:37 +0000 Subject: [PATCH] reciprocal edges: 7 edges from 1 new claims --- ...is-for-mandatory-third-party-evaluation.md | 1 + ...vidence-exceeding-benchmark-predictions.md | 1 + ...pability-uplift-to-operational-autonomy.md | 13 +++++-- ...al-negotiation-is-governance-instrument.md | 5 ++- ...on-behaviorally-insufficient-evaluation.md | 15 ++++++- ...ernance-built-on-unreliable-foundations.md | 39 +++++++++++++++++-- ...mands-safety-unconstrained-alternatives.md | 16 ++++++-- 7 files changed, 77 insertions(+), 13 deletions(-) diff --git a/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md b/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md index 691592136..f92a93714 100644 --- a/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md +++ b/domains/ai-alignment/cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation.md @@ -17,6 +17,7 @@ related: - AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns - pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations - multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect reweave_edges: - Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17 supports: diff --git a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md index 363307387..302275332 100644 --- a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md +++ b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md @@ -15,6 +15,7 @@ related: - cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions - cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics - AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect reweave_edges: - AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|related|2026-04-06 supports: diff --git a/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md b/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md index 5f700c374..8b4d8b378 100644 --- a/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md +++ b/domains/ai-alignment/first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy.md @@ -10,9 +10,16 @@ agent: theseus sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md scope: causal sourcer: UK AI Security Institute -supports: ["three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture", "voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives"] -challenges: ["cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics"] -related: ["cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements"] +supports: +- three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture +- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives +challenges: +- cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics +related: +- cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions +- ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable +- benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect --- # The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change diff --git a/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md b/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md index 359aa6cbe..a8a4cc501 100644 --- a/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md +++ b/domains/ai-alignment/independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument.md @@ -10,7 +10,10 @@ agent: theseus sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md scope: functional sourcer: UK AI Security Institute -related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation"] +related: +- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives +- cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect --- # Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction diff --git a/domains/ai-alignment/major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation.md b/domains/ai-alignment/major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation.md index 6dbd3472e..8606cfc61 100644 --- a/domains/ai-alignment/major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation.md +++ b/domains/ai-alignment/major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation.md @@ -10,8 +10,19 @@ agent: theseus sourced_from: ai-alignment/2026-04-22-theseus-santos-grueiro-governance-audit.md scope: structural sourcer: Theseus -supports: ["multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale", "evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient"] -related: ["behavioral-evaluation-is-structurally-insufficient-for-latent-alignment-verification-under-evaluation-awareness-due-to-normative-indistinguishability", "multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale", "voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions", "scheming-safety-cases-require-interpretability-evidence-because-observer-effects-make-behavioral-evaluation-insufficient", "frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation"] +supports: +- multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale +- evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient +related: +- behavioral-evaluation-is-structurally-insufficient-for-latent-alignment-verification-under-evaluation-awareness-due-to-normative-indistinguishability +- multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale +- voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance +- evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions +- scheming-safety-cases-require-interpretability-evidence-because-observer-effects-make-behavioral-evaluation-insufficient +- frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable +- AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns +- major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect --- # Major AI safety governance frameworks are architecturally dependent on behavioral evaluation that Santos-Grueiro's normative indistinguishability theorem establishes is structurally insufficient for latent alignment verification as evaluation awareness scales diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 99b158bc8..4a21eadf3 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -7,10 +7,41 @@ source: International AI Safety Report 2026 (multi-government committee, Februar created: 2026-03-11 secondary_domains: ["grand-strategy"] last_evaluated: 2026-03-11 -depends_on: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"] -related: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations", "evidence-dilemma-rapid-ai-development-structurally-prevents-adequate-pre-deployment-safety-evidence-accumulation", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions", "benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability", "meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "activation-based-persona-monitoring-detects-behavioral-trait-shifts-in-small-models-without-behavioral-testing", "current-safety-evaluation-datasets-vary-37-to-100-percent-in-model-detectability-rendering-highly-detectable-evaluations-uninformative", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements", "provider-level-behavioral-biases-persist-across-model-versions-requiring-psychometric-auditing-beyond-standard-benchmarks", "trajectory-geometry-probing-requires-white-box-access-limiting-deployment-to-controlled-evaluation-contexts", "external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection", "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "frontier-ai-safety-verdicts-rely-on-deployment-track-record-not-evaluation-confidence", "precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty", "making-research-evaluations-into-compliance-triggers-closes-the-translation-gap-by-design", "white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure"] -reweave_edges: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06", "The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17"] -supports: ["The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation"] +depends_on: +- voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints +related: +- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability +- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured +- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks +- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith +- pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations +- evidence-dilemma-rapid-ai-development-structurally-prevents-adequate-pre-deployment-safety-evidence-accumulation +- AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns +- evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions +- benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability +- meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence +- ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable +- activation-based-persona-monitoring-detects-behavioral-trait-shifts-in-small-models-without-behavioral-testing +- current-safety-evaluation-datasets-vary-37-to-100-percent-in-model-detectability-rendering-highly-detectable-evaluations-uninformative +- benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements +- provider-level-behavioral-biases-persist-across-model-versions-requiring-psychometric-auditing-beyond-standard-benchmarks +- trajectory-geometry-probing-requires-white-box-access-limiting-deployment-to-controlled-evaluation-contexts +- external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection +- bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability +- cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions +- frontier-ai-safety-verdicts-rely-on-deployment-track-record-not-evaluation-confidence +- precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty +- making-research-evaluations-into-compliance-triggers-closes-the-translation-gap-by-design +- white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect +reweave_edges: +- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06 +- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17 +- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17 +- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17 +- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17 +supports: +- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation sourced_from: - inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md --- diff --git a/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md b/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md index 136eaa5b2..37185687f 100644 --- a/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md +++ b/domains/grand-strategy/voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives.md @@ -10,9 +10,19 @@ agent: leo scope: structural sourcer: Leo related_claims: ["[[technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation]]"] -supports: ["Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility"] -reweave_edges: ["Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility|supports|2026-04-07"] -related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "judicial-oversight-of-ai-governance-through-constitutional-grounds-not-statutory-safety-law", "voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance", "voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "judicial-oversight-checks-executive-ai-retaliation-but-cannot-create-positive-safety-obligations", "judicial-framing-of-voluntary-ai-safety-constraints-as-financial-harm-removes-constitutional-floor-enabling-administrative-dismantling", "split-jurisdiction-injunction-pattern-maps-boundary-of-judicial-protection-for-voluntary-ai-safety-policies-civil-protected-military-not"] +supports: +- Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility +reweave_edges: +- Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while preserving operational flexibility|supports|2026-04-07 +related: +- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives +- judicial-oversight-of-ai-governance-through-constitutional-grounds-not-statutory-safety-law +- voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance +- voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance +- judicial-oversight-checks-executive-ai-retaliation-but-cannot-create-positive-safety-obligations +- judicial-framing-of-voluntary-ai-safety-constraints-as-financial-harm-removes-constitutional-floor-enabling-administrative-dismantling +- split-jurisdiction-injunction-pattern-maps-boundary-of-judicial-protection-for-voluntary-ai-safety-policies-civil-protected-military-not +- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect --- # Voluntary AI safety constraints are protected as corporate speech but unenforceable as safety requirements, creating legal mechanism gap when primary demand-side actor seeks safety-unconstrained providers