diff --git a/inbox/archive/general/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md b/inbox/archive/general/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md new file mode 100644 index 000000000..4953b40d0 --- /dev/null +++ b/inbox/archive/general/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md @@ -0,0 +1,54 @@ +--- +type: source +title: "Anthropic RSP v3.0: Binary Safety Thresholds Replaced with Conditional Escape Clauses (Feb 24, 2026)" +author: "Anthropic (news); TIME reporting (March 6, 2026)" +url: https://www.anthropic.com/rsp +date: 2026-02-24 +domain: ai-alignment +secondary_domains: [] +format: policy-document +status: processed +priority: high +tags: [RSP, Anthropic, voluntary-safety, conditional-commitment, METR, frog-boiling, competitive-pressure, alignment-tax, B1-confirmation] +--- + +## Content + +Anthropic released **Responsible Scaling Policy v3.0** on February 24, 2026 — characterized as "a comprehensive rewrite of the RSP." + +**RSP v3.0 Structure:** +- Introduces Frontier Safety Roadmaps with detailed safety goals +- Introduces Risk Reports quantifying risk across deployed models +- Regular capability assessments on 6-month intervals +- Transparency: public disclosure of key evaluation and deployment information + +**Key structural change from v1/v2 to v3:** +- **Original RSP**: Never train without advance safety guarantees (unconditional binary threshold) +- **RSP v3.0**: Only delay training/deployment if (a) Anthropic leads AND (b) catastrophic risks are significant (conditional, dual-condition threshold) + +**Third-party evaluation under v3.0**: The document does not specify mandatory third-party evaluations. Emphasizes Anthropic's own internal capability assessments. Plans to "publish additional details on capability assessment methodology" in the future. + +**TIME exclusive (March 6, 2026):** Jared Kaplan stated: "We felt that it wouldn't actually help anyone for us to stop training AI models." METR's Chris Painter warned of a **"frog-boiling" effect** from removing binary thresholds. Financial context: $30B raise at ~$380B valuation, 10x annual revenue growth. + +## Agent Notes + +**Why this matters:** RSP v3.0 is a concrete case study in how competitive pressure degrades voluntary safety commitments — exactly the mechanism our KB claims describe. The original RSP was unconditional (a commitment to stop regardless of competitive context). The new RSP is conditional: Anthropic only needs to pause if it leads the field AND risks are catastrophic. This introduces two escape clauses: (1) if competitors advance, no pause needed; (2) if risks are judged "not significant," no pause needed. Both conditions are assessed by Anthropic itself. + +**The frog-boiling warning:** METR's Chris Painter's critique is significant coming from Anthropic's own evaluator partner. METR works WITH Anthropic on pre-deployment evaluations — when they warn about safety erosion, it's from inside the voluntary-collaborative system. This is a self-assessment of the system's weakness by one of its participants. + +**What surprised me:** That RSP v3.0 exists at all after the TIME article characterized it as "dropping" the pledge. The policy still uses the "RSP" name and retains a commitment structure — but the structural shift from unconditional to conditional thresholds is substantial. The framing of "comprehensive rewrite" is accurate but characterizing it as a continuation of the RSP may obscure how much the commitment has changed. + +**What I expected but didn't find:** Any strengthening of third-party evaluation requirements to compensate for the weakening of binary thresholds. If you remove unconditional safety floors, you'd expect independent evaluation to become MORE important as a safeguard. RSP v3.0 appears to have done the opposite — no mandatory third-party evaluation and internal assessment emphasis. + +**KB connections:** +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — RSP v3.0 is the explicit enactment of this claim; the "Anthropic leads" condition makes the commitment structurally dependent on competitor behavior +- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — the $30B/$380B context makes visible why the alignment tax is real: at these valuations, any pause has enormous financial cost + +**Extraction hints:** This source enriches the existing claim voluntary safety pledges cannot survive competitive pressure with the specific mechanism: the "Anthropic leads" condition transforms a safety commitment into a competitive strategy, not a safety floor. New claim candidate: "Anthropic RSP v3.0 replaces unconditional binary safety floors with dual-condition thresholds requiring both competitive leadership and catastrophic risk assessment — making the commitment evaluate-able as a business judgment rather than a categorical safety line." + +**Context:** RSP v1.0 was created in 2023 as a model for voluntary lab safety commitments. The transition from binary unconditional to conditional thresholds reflects 3 years of competitive pressure at escalating scales ($30B at $380B valuation). + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] +WHY ARCHIVED: Provides the most current and specific evidence of the voluntary-commitment collapse mechanism — not hypothetical but documented with RSP v1→v3 structural change and Kaplan quotes +EXTRACTION HINT: The structural change (unconditional → dual-condition) is the key extractable claim; the frog-boiling quote from METR is supporting evidence; the $30B context explains the financial incentive driving the change diff --git a/inbox/queue/.extraction-debug/2026-03-20-anthropic-rsp-v3-conditional-thresholds.json b/inbox/queue/.extraction-debug/2026-03-20-anthropic-rsp-v3-conditional-thresholds.json new file mode 100644 index 000000000..5bcb7fffe --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-20-anthropic-rsp-v3-conditional-thresholds.json @@ -0,0 +1,29 @@ +{ + "rejected_claims": [ + { + "filename": "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md", + "issues": [ + "missing_attribution_extractor", + "opsec_internal_deal_terms" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 4, + "rejected": 1, + "fixes_applied": [ + "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md:set_created:2026-03-20", + "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md:stripped_wiki_link:voluntary-safety-pledges-cannot-survive-competitive-pressure", + "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md:stripped_wiki_link:Anthropics-RSP-rollback-under-commercial-pressure-is-the-fir", + "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md:stripped_wiki_link:only-binding-regulation-with-enforcement-teeth-changes-front" + ], + "rejections": [ + "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md:missing_attribution_extractor", + "anthropic-rsp-v3-replaces-unconditional-safety-thresholds-with-dual-condition-escape-clauses.md:opsec_internal_deal_terms" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-20" +} \ No newline at end of file diff --git a/inbox/queue/.extraction-debug/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.json b/inbox/queue/.extraction-debug/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.json new file mode 100644 index 000000000..9b7ef1ad8 --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.json @@ -0,0 +1,26 @@ +{ + "rejected_claims": [ + { + "filename": "eu-ai-act-article-43-conformity-assessment-is-self-certification-not-independent-evaluation.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 3, + "rejected": 1, + "fixes_applied": [ + "eu-ai-act-article-43-conformity-assessment-is-self-certification-not-independent-evaluation.md:set_created:2026-03-20", + "eu-ai-act-article-43-conformity-assessment-is-self-certification-not-independent-evaluation.md:stripped_wiki_link:voluntary-safety-pledges-cannot-survive-competitive-pressure", + "eu-ai-act-article-43-conformity-assessment-is-self-certification-not-independent-evaluation.md:stripped_wiki_link:only-binding-regulation-with-enforcement-teeth-changes-front" + ], + "rejections": [ + "eu-ai-act-article-43-conformity-assessment-is-self-certification-not-independent-evaluation.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-20" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md b/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md index 6fc8d5cd8..36688a46d 100644 --- a/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md +++ b/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md @@ -7,9 +7,12 @@ date: 2026-02-24 domain: ai-alignment secondary_domains: [] format: policy-document -status: unprocessed +status: enrichment priority: high tags: [RSP, Anthropic, voluntary-safety, conditional-commitment, METR, frog-boiling, competitive-pressure, alignment-tax, B1-confirmation] +processed_by: theseus +processed_date: 2026-03-20 +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -52,3 +55,12 @@ Anthropic released **Responsible Scaling Policy v3.0** on February 24, 2026 — PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] WHY ARCHIVED: Provides the most current and specific evidence of the voluntary-commitment collapse mechanism — not hypothetical but documented with RSP v1→v3 structural change and Kaplan quotes EXTRACTION HINT: The structural change (unconditional → dual-condition) is the key extractable claim; the frog-boiling quote from METR is supporting evidence; the $30B context explains the financial incentive driving the change + + +## Key Facts +- Anthropic released RSP v3.0 on February 24, 2026 +- RSP v3.0 introduces Frontier Safety Roadmaps and Risk Reports +- RSP v3.0 requires capability assessments on 6-month intervals +- Jared Kaplan stated 'We felt that it wouldn't actually help anyone for us to stop training AI models' in TIME interview March 6, 2026 +- Anthropic raised $30B at approximately $380B valuation with 10x annual revenue growth (context for RSP v3.0 release) +- METR (Anthropic's evaluation partner) warned of 'frog-boiling effect' from RSP v3.0 changes diff --git a/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md b/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md index d3164a71e..b6922d7cc 100644 --- a/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md +++ b/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md @@ -7,9 +7,13 @@ date: 2024-07-12 domain: ai-alignment secondary_domains: [] format: legislation -status: unprocessed +status: null-result priority: medium tags: [EU-AI-Act, Article-43, conformity-assessment, self-assessment, notified-bodies, high-risk-AI, independence, FDA-comparison] +processed_by: theseus +processed_date: 2026-03-20 +extraction_model: "anthropic/claude-sonnet-4.5" +extraction_notes: "LLM returned 1 claims, 1 rejected by validator" --- ## Content @@ -46,3 +50,13 @@ Article 43 establishes conformity assessment procedures for **high-risk AI syste PRIMARY CONNECTION: voluntary safety pledges cannot survive competitive pressure — self-certification under Article 43 has the same structural weakness as voluntary commitments; labs certify their own compliance WHY ARCHIVED: Corrects common misreading of EU AI Act as creating FDA-equivalent independent evaluation via Article 43; clarifies that independent evaluation runs through Article 92 (reactive) not Article 43 (conformity) EXTRACTION HINT: This is primarily a clarifying/corrective source; extractor should check whether any existing KB claims overstate Article 43's independence requirements and note the Article 43 / Article 92 distinction + + +## Key Facts +- EU AI Act Article 43 governs conformity assessment for high-risk AI systems (Annex III categories) +- High-risk AI in Annex III points 2-8 use internal control (self-assessment) only +- High-risk AI in Annex III point 1 (biometric identification) may choose between internal control OR notified body assessment +- Third-party notified body required only when: harmonized standards don't exist, common specifications unavailable, provider hasn't fully applied standards, or standards published with restrictions +- For law enforcement and immigration uses, the market surveillance authority acts as the notified body +- Article 43 applies to high-risk AI systems (classification by use case), distinct from GPAI systemic risk provisions (Articles 51-56) which govern models by training compute scale +- Article 92 provides compulsory AI Office evaluation as a separate mechanism from Article 43 conformity assessment