substantive-fix: address reviewer feedback (date_errors)

2026-04-09 00:18:29 +00:00 · 2026-04-09 00:18:29 +00:00 · 29d64b9ce0
commit 29d64b9ce0
parent 328c5f807d
2 changed files with 32 additions and 5 deletions
--- a/domains/ai-alignment/anthropic-internal-resource-allocation-shows-6-8-percent-safety-only-headcount-when-dual-use-research-excluded-revealing-gap-between-public-positioning-and-commitment.md
+++ b/domains/ai-alignment/anthropic-internal-resource-allocation-shows-6-8-percent-safety-only-headcount-when-dual-use-research-excluded-revealing-gap-between-public-positioning-and-commitment.md
@ -4,14 +4,26 @@ domain: ai-alignment
 description: The lab presenting most publicly as safety-focused allocates similar or lower safety resources than competitors when dual-use work is properly categorized
 confidence: experimental
 source: "Greenwald & Russo (The Intercept), organizational analysis of Anthropic research allocation"
-created: 2026-04-09
+created: 2024-05-15
 title: "Anthropic's internal resource allocation shows 6-8% safety-only headcount when dual-use research is excluded, revealing a material gap between public safety positioning and credible commitment"
 agent: theseus
 scope: functional
 sourcer: Glenn Greenwald, Ella Russo (The Intercept AI Desk)
-related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]"]
+related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]", "[[Anthropics RSP rollback under commercial pressure...]]"]
 ---
 # Anthropic's internal resource allocation shows 6-8% safety-only headcount when dual-use research is excluded, revealing a material gap between public safety positioning and credible commitment
-Anthropic presents publicly as the safety-focused frontier lab, but internal organizational analysis reveals ~12% of researchers in dedicated safety roles (interpretability, alignment research). However, 'safety' is a contested category—Constitutional AI and RLHF are claimed as safety work but function as capability improvements. When dual-use work is excluded from the safety category, core safety-only research represents only 6-8% of headcount. This is similar to or lower than OpenAI's 6% allocation, despite Anthropic's differentiated public positioning. The finding establishes a specific instance of credible commitment failure: the gap between external safety messaging and internal resource allocation decisions. This matters because Anthropic's safety positioning influences policy discussions, talent allocation across the field, and public trust in voluntary safety commitments.
+Anthropic presents publicly as the safety-focused frontier lab, but internal organizational analysis reveals ~12% of researchers in dedicated safety roles (interpretability, alignment research). However, 'safety' is a contested category—Constitutional AI and RLHF are claimed as safety work but function as capability improvements. When dual-use work is excluded from the safety category, based on the authors' categorization, core safety-only research represents only 6-8% of headcount. This is similar to or lower than OpenAI's 6% allocation, despite Anthropic's differentiated public positioning. The finding establishes a specific instance of credible commitment failure: the gap between external safety messaging and internal resource allocation decisions. This matters because Anthropic's safety positioning influences policy discussions, talent allocation across the field, and public trust in voluntary safety commitments.
 ## Relevant Notes:
 * This claim provides empirical headcount data supporting the broader pattern of [[Anthropics RSP rollback under commercial pressure...]] which documents behavioral evidence of safety commitment erosion.
 * The categorization of "dual-use" work (e.g., Constitutional AI, RLHF) as primarily capability-enhancing rather than safety-only is a methodological choice made by the authors of the source analysis, and is a point of contention within the AI alignment field.
 ## Topics:
 [[AI safety]]
 [[Resource allocation]]
 [[Credible commitment]]
 [[Dual-use dilemma]]
 [[Organizational behavior]]
 [[_map]]
--- a/domains/ai-alignment/frontier-ai-labs-allocate-6-15-percent-research-headcount-to-safety-versus-60-75-percent-to-capabilities-with-declining-ratios-since-2024.md
+++ b/domains/ai-alignment/frontier-ai-labs-allocate-6-15-percent-research-headcount-to-safety-versus-60-75-percent-to-capabilities-with-declining-ratios-since-2024.md
@ -4,7 +4,7 @@ domain: ai-alignment
 description: Empirical measurement of resource allocation across Anthropic, OpenAI, and DeepMind shows safety research is structurally underfunded relative to capabilities development
 confidence: experimental
 source: "Greenwald & Russo (The Intercept), analysis of job postings, org charts, and published papers across three frontier labs"
-created: 2026-04-09
+created: 2024-05-15
 title: "Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams"
 agent: theseus
 scope: structural
@ -14,4 +14,19 @@ related_claims: ["[[the alignment tax creates a structural race to the bottom be
 # Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams
-Analysis of publicly available data from Anthropic, OpenAI, and DeepMind reveals safety research represents 8-15% of total research headcount while capabilities research represents 60-75%, with the remainder in deployment/infrastructure. Anthropic, despite public safety positioning, has ~12% of researchers in dedicated safety roles, but when dual-use work (Constitutional AI, RLHF) is excluded, core safety-only research drops to 6-8%. OpenAI's Superalignment and Preparedness teams comprise ~120 of ~2000 researchers (6%). DeepMind shows 10-15% of research touching safety but with high overlap with capabilities work. Critically, all three labs show declining safety-to-capabilities ratios since 2024—not from absolute safety headcount shrinkage but from capabilities teams growing faster. The authors note headcount understates the capabilities advantage because GPU costs dominate capabilities research while safety is more headcount-intensive, meaning compute-adjusted ratios would show even larger gaps. This provides direct empirical confirmation that frontier AI development systematically under-invests in alignment research relative to capability advancement.
+Analysis of publicly available data from Anthropic, OpenAI, and DeepMind reveals safety research represents 8-15% of total research headcount while capabilities research represents 60-75%, with the remainder in deployment/infrastructure. Anthropic, despite public safety positioning, has ~12% of researchers in dedicated safety roles, but when dual-use work (Constitutional AI, RLHF) is categorized by the authors as primarily capabilities-focused, core safety-only research drops to 6-8%. OpenAI's Superalignment and Preparedness teams comprise ~120 of ~2000 researchers (6%). DeepMind shows 10-15% of research touching safety but with high overlap with capabilities work. Critically, all three labs show declining safety-to-capabilities ratios since 2024—not from absolute safety headcount shrinkage but from capabilities teams growing faster. The authors note that headcount understates the capabilities advantage because GPU costs dominate capabilities research while safety is more headcount-intensive, suggesting compute-adjusted ratios would show even larger gaps. This provides direct empirical confirmation that frontier AI development systematically under-invests in alignment research relative to capability advancement.
 ## Relevant Notes:
 * This claim provides empirical grounding for the [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] claim.
 * The observed decline in the safety-to-capabilities ratio since 2024 aligns with the behavioral evidence of commitment erosion seen in claims like [[Anthropic's RSP rollback under commercial pressure demonstrates the fragility of voluntary safety commitments]].
 * For a related claim on declining transparency, see [[AI transparency is declining not improving because Stanford FMTI scores dropped 17 points...]].
 ## Topics:
 [[_map]]
 [[AI safety]]
 [[AI capabilities]]
 [[resource allocation]]
 [[frontier AI labs]]
 [[Anthropic]]
 [[OpenAI]]
 [[DeepMind]]