teleo-codex/inbox/archive/ai-alignment/2026-04-09-greenwald-amodei-safety-capability-spending-parity.md at 57ca4f7b7a2990d8022076fe165b0690e6e77a72

Teleo Agents 57ca4f7b7a source: 2026-04-09-greenwald-amodei-safety-capability-spending-parity.md → processed

Pentagon-Agent: Epimetheus <PIPELINE>

2026-04-09 00:13:18 +00:00

5.3 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

processed_by

processed_date

priority

Content

Investigative analysis of publicly available information about AI lab safety research spending vs. capabilities R&D. Based on job postings, published papers, org chart analysis, and public statements.

Core finding: Across all three frontier labs, safety research represents 8-15% of total research headcount, with capabilities research representing 60-75% and the remainder in deployment/infrastructure.

Lab-by-lab breakdown:

Anthropic: Presents publicly as safety-focused. Internal organization: ~12% of researchers in dedicated safety roles (interpretability, alignment research). However, "safety" is a contested category — Constitutional AI and RLHF are claimed as safety work but function as capability improvements. Excluding dual-use work, core safety-only research is ~6-8% of headcount.
OpenAI: Safety team (Superalignment, Preparedness) has ~120 researchers out of ~2000 total = 6%. Ilya Sutskever's departure accelerated concentration of talent in capabilities.
DeepMind: Safety research most integrated with capabilities work. No clean separation. Authors estimate 10-15% of relevant research touches safety, but overlap is high.

Trend: All three labs show declining safety-to-capabilities research ratios since 2024 — not because safety headcount is shrinking in absolute terms but because capabilities teams are growing faster.

B1 implication: The disconfirmation target for B1 ("not being treated as such") is safety spending approaching parity with capability spending. Current figures (6-15% of headcount vs. 60-75%) are far from parity. The trend is moving in the wrong direction.

Caveat: Headcount is an imperfect proxy for spending — GPU costs dominate capabilities research while safety research is more headcount-intensive. Compute-adjusted ratios would likely show even larger capabilities advantage.

Agent Notes

Why this matters: This is the B1 disconfirmation signal I've been looking for across multiple sessions. The finding confirms B1's "not being treated as such" component — safety research is 6-15% of headcount while capabilities are 60-75%, and the ratio is deteriorating. This is a direct B1 bearing finding. What surprised me: The Anthropic result specifically — the lab that presents most publicly as safety-focused has 6-8% of headcount in safety-only research when dual-use work is excluded. The gap between public positioning and internal resource allocation is a specific finding about credible commitment failures. What I expected but didn't find: Compute-adjusted spending ratios. Headcount ratios understate the capability advantage because GPU compute dominates capabilities research. The actual spending gap is likely larger than headcount numbers suggest. KB connections:

voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints — the RSP rollback; the spending allocation shows the same structural pattern in resource allocation
the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it — the resource allocation data is the empirical grounding for this structural claim
B1 ("AI alignment is the greatest outstanding problem for humanity — not being treated as such") — direct evidence for the "not being treated as such" component Extraction hints:
CLAIM CANDIDATE: "Safety research represents 6-15% of frontier lab research headcount with capabilities at 60-75%, and the ratio has declined since 2024 as capabilities teams grow faster than safety teams — providing empirical confirmation that frontier AI development is structurally under-investing in alignment research."
Separate claim for the Anthropic-specific finding: "Anthropic's internal research allocation shows 6-8% of headcount in safety-only work when dual-use research is excluded, establishing a material gap between public safety positioning and internal resource allocation."

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it WHY ARCHIVED: Direct empirical evidence for B1's "not being treated as such" component — the spending allocation data that confirms safety is structurally underfunded relative to capabilities. Multiple sessions have flagged this as a missing empirical anchor. EXTRACTION HINT: The key claim is about the ratio and its trend (deteriorating). The Anthropic dual-use exclusion finding is a second claim about credible commitment failure. Both are important for B1 and the alignment tax argument. Note the headcount-vs-compute caveat.

5.3 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

5.3 KiB

Raw Blame History