diff --git a/agents/theseus/research-journal.md b/agents/theseus/research-journal.md index 196c43f3..b29f9736 100644 --- a/agents/theseus/research-journal.md +++ b/agents/theseus/research-journal.md @@ -189,7 +189,7 @@ NEW PATTERN: STRENGTHENED: - B1 (alignment not being treated as such) — holds. Mechanisms exist but are mismatched in scale to the severity of the problem. The DoD/Anthropic confrontation is a concrete case of government functioning as coordination-BREAKER. - B2 (alignment is a coordination problem) — automation overshoot correction is also a coordination failure. The four mechanisms require coordination across firms/regulators to function; firms acting individually cannot correct for competitive pressure. -- "Government as coordination-breaker" — updated with DoD/Anthropic episode. This is a stronger confirmation of the [[government designation of safety-conscious AI labs as supply chain risks]] claim. +- "Government as coordination-breaker" — updated with DoD/Anthropic episode. This is a stronger confirmation of the government designation of safety-conscious AI labs as supply chain risks claim. COMPLICATED: - The measurement dependency insight complicates all constructive alternatives. Even if we build collective intelligence infrastructure (B5), it needs accurate performance signals to self-correct. The perception gap at the organizational level is a precursor problem that the constructive case hasn't addressed. diff --git a/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md b/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md index 030d0fc8..6fc8d5cd 100644 --- a/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md +++ b/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md @@ -44,7 +44,7 @@ Anthropic released **Responsible Scaling Policy v3.0** on February 24, 2026 — - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — RSP v3.0 is the explicit enactment of this claim; the "Anthropic leads" condition makes the commitment structurally dependent on competitor behavior - [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — the $30B/$380B context makes visible why the alignment tax is real: at these valuations, any pause has enormous financial cost -**Extraction hints:** This source enriches the existing claim [[voluntary safety pledges cannot survive competitive pressure]] with the specific mechanism: the "Anthropic leads" condition transforms a safety commitment into a competitive strategy, not a safety floor. New claim candidate: "Anthropic RSP v3.0 replaces unconditional binary safety floors with dual-condition thresholds requiring both competitive leadership and catastrophic risk assessment — making the commitment evaluate-able as a business judgment rather than a categorical safety line." +**Extraction hints:** This source enriches the existing claim voluntary safety pledges cannot survive competitive pressure with the specific mechanism: the "Anthropic leads" condition transforms a safety commitment into a competitive strategy, not a safety floor. New claim candidate: "Anthropic RSP v3.0 replaces unconditional binary safety floors with dual-condition thresholds requiring both competitive leadership and catastrophic risk assessment — making the commitment evaluate-able as a business judgment rather than a categorical safety line." **Context:** RSP v1.0 was created in 2023 as a model for voluntary lab safety commitments. The transition from binary unconditional to conditional thresholds reflects 3 years of competitive pressure at escalating scales ($30B at $380B valuation). diff --git a/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md b/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md index 87e269af..783a4815 100644 --- a/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md +++ b/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md @@ -41,8 +41,8 @@ The paper examines whether current AI benchmarks are adequate for EU AI Act regu **KB connections:** - [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — this paper shows the problem isn't just oversight at deployment, it's that the evaluation tools for oversight don't even measure the right things -- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match]] — formal verification works for mathematical domains; this paper shows behavioral compliance benchmarking fails even more completely -- [[AI capability and reliability are independent dimensions]] — benchmarks measure one dimension (behavioral propensities) and miss another (alignment-critical failure modes) +- formal verification of AI-generated proofs provides scalable oversight that human review cannot match — formal verification works for mathematical domains; this paper shows behavioral compliance benchmarking fails even more completely +- AI capability and reliability are independent dimensions — benchmarks measure one dimension (behavioral propensities) and miss another (alignment-critical failure modes) **Extraction hints:** Strong claim candidate: "Current AI benchmarks provide zero coverage of capabilities central to loss-of-control scenarios — oversight evasion, self-replication, autonomous AI development — making them structurally insufficient for EU AI Act Article 55 compliance despite being the primary compliance evidence labs provide." This is specific, falsifiable, empirically grounded. diff --git a/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md b/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md index 5712a161..d3164a71 100644 --- a/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md +++ b/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md @@ -43,6 +43,6 @@ Article 43 establishes conformity assessment procedures for **high-risk AI syste **Context:** Article 43 applies to high-risk AI systems (Annex III list: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, justice). GPAI models face a separate and in some ways more stringent regime under Articles 51-56 when they meet the systemic risk threshold. ## Curator Notes (structured handoff for extractor) -PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure]] — self-certification under Article 43 has the same structural weakness as voluntary commitments; labs certify their own compliance +PRIMARY CONNECTION: voluntary safety pledges cannot survive competitive pressure — self-certification under Article 43 has the same structural weakness as voluntary commitments; labs certify their own compliance WHY ARCHIVED: Corrects common misreading of EU AI Act as creating FDA-equivalent independent evaluation via Article 43; clarifies that independent evaluation runs through Article 92 (reactive) not Article 43 (conformity) EXTRACTION HINT: This is primarily a clarifying/corrective source; extractor should check whether any existing KB claims overstate Article 43's independence requirements and note the Article 43 / Article 92 distinction diff --git a/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md b/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md index 731ec399..8a1c1d1a 100644 --- a/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md +++ b/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md @@ -34,7 +34,7 @@ On **November 19, 2025**, the European Commission proposed "targeted amendments" **KB connections:** - [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — if simplification amendments weaken enforcement, the gap widens further -- [[voluntary safety pledges cannot survive competitive pressure]] — EU legislative amendments under competitive pressure may follow the same structural logic as voluntary pledge weakening +- voluntary safety pledges cannot survive competitive pressure — EU legislative amendments under competitive pressure may follow the same structural logic as voluntary pledge weakening **Extraction hints:** This source is primarily a flag rather than a substantive claim source. The claim candidate: "EU AI Act enforcement faced simplification pressure within 3.5 months of GPAI obligations taking effect — suggesting the regulatory implementation cycle for AI governance may itself be subject to competitive erosion dynamics similar to voluntary commitment collapse." But this needs confirmation of what the amendments actually propose. diff --git a/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md b/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md index 4090b09d..0836606c 100644 --- a/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md +++ b/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md @@ -47,9 +47,9 @@ Maximum fine: **3% of annual worldwide turnover or EUR 15 million, whichever is **What I expected but didn't find:** A proactive pre-deployment evaluation requirement. The EU AI Act creates mandatory obligations (Article 55) with binding enforcement (Articles 92, 101) but the evaluation is triggered by problems, not required as a condition of deployment. The FDA analogy fails specifically here — drugs cannot be deployed without pre-market approval; GPAI models under EU AI Act can be deployed while the AI Office monitors and intervenes reactively. **KB connections:** -- [[voluntary safety pledges cannot survive competitive pressure]] — Article 55 creates mandatory obligations that don't depend on voluntary commitment, but the flexible compliance pathways preserve lab discretion in HOW they comply -- [[scalable oversight degrades rapidly as capability gaps grow]] — Article 92's compulsory evaluation powers don't solve the AAL-3/4 infeasibility problem; even with source code access, deception-resilient evaluation is technically infeasible -- [[technology advances exponentially but coordination mechanisms evolve linearly]] — the 10^25 FLOP threshold will require updating as compute efficiency improves +- voluntary safety pledges cannot survive competitive pressure — Article 55 creates mandatory obligations that don't depend on voluntary commitment, but the flexible compliance pathways preserve lab discretion in HOW they comply +- scalable oversight degrades rapidly as capability gaps grow — Article 92's compulsory evaluation powers don't solve the AAL-3/4 infeasibility problem; even with source code access, deception-resilient evaluation is technically infeasible +- technology advances exponentially but coordination mechanisms evolve linearly — the 10^25 FLOP threshold will require updating as compute efficiency improves **Extraction hints:** Primary claim: "EU AI Act Article 92 creates the first binding compulsory evaluation powers for frontier AI models globally — AI Office can compel API/source code access and appoint independent experts — but enforcement is reactive not proactive, falling structurally short of FDA pre-market approval." Secondary claim: "EU AI Act flexible compliance pathways for Article 55 allow GPAI systemic risk models to self-certify compliance through codes of practice rather than mandatory independent third-party audit." diff --git a/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md b/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md index 0c90289b..c6036198 100644 --- a/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md +++ b/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md @@ -37,7 +37,7 @@ Evaluates **twelve frontier AI safety frameworks** published following the 2024 **What I expected but didn't find:** Any framework achieving above 50% — suggesting the entire field hasn't developed the risk management maturity that safety-critical industries (aviation, nuclear, pharmaceutical) have. The 35% top score is specifically compared to established safety management principles, not to some aspirational ideal. **KB connections:** -- [[voluntary safety pledges cannot survive competitive pressure]] — this paper shows the problem is deeper: even companies that ARE publishing safety frameworks are doing so at 8-35% of safety-critical industry standards +- voluntary safety pledges cannot survive competitive pressure — this paper shows the problem is deeper: even companies that ARE publishing safety frameworks are doing so at 8-35% of safety-critical industry standards - [[safe AI development requires building alignment mechanisms before scaling capability]] — these frameworks are supposed to be the alignment mechanisms, and they're at 8-35% completion - Brundage et al. AAL framework (previous session): AAL-1 is "peak of current voluntary practice." This paper quantifies what AAL-1 actually looks like: 8-35% of safety-critical industry standards. diff --git a/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md b/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md index 1fe4b82c..7e67f107 100644 --- a/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md +++ b/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md @@ -31,8 +31,8 @@ tags: [GPAI, Code-of-Practice, industry-practices, EU-AI-Act, safety-measures, O **What I expected but didn't find:** Evidence that the Code of Practice requires genuinely independent third-party verification of the safety measures it lists. From the structure, it appears labs self-certify compliance through code adherence, with the AI Office potentially auditing retrospectively. **KB connections:** -- [[voluntary safety pledges cannot survive competitive pressure]] — the Code of Practice may formalize existing voluntary commitments without adding enforcement mechanisms that survive competitive pressure -- [[an aligned-seeming AI may be strategically deceptive]] — the gap between published safety commitments and actual model behavior is precisely what deception-resilient evaluation (AAL-3/4) is designed to detect +- voluntary safety pledges cannot survive competitive pressure — the Code of Practice may formalize existing voluntary commitments without adding enforcement mechanisms that survive competitive pressure +- an aligned-seeming AI may be strategically deceptive — the gap between published safety commitments and actual model behavior is precisely what deception-resilient evaluation (AAL-3/4) is designed to detect **Extraction hints:** Supporting claim: "GPAI Code of Practice safety measures map to existing commitments from major AI labs — but the mapping is of stated policies, not verified behaviors, leaving the deception-resilient gap unaddressed." Use cautiously — authors explicitly say this is not compliance evidence.