teleo-codex/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md at 43eca8b8e3a2bf3943cf4e31a6b62b1b649c8429

Mirror PR to Forgejo / mirror (pull_request) Has been cancelled

Details

Pipeline auto-fixer: removed [[ ]] brackets from links
that don't resolve to existing claims in the knowledge base.

2026-04-26 00:24:53 +00:00

6.8 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Source: Google DeepMind blog post, "Strengthening our Frontier Safety Framework," published April 17, 2026. Full framework document: storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3.pdf

Core updates in FSF v3.0:

Tracked Capability Levels (TCLs): New capability tier added below Critical Capability Levels. TCLs identify "potential less extreme risks sooner" — an early warning layer before the model reaches a CCL threshold. This creates a two-tier system: TCL (monitor and track) → CCL (deploy mitigations).
New CCL: Harmful Manipulation: "AI models with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviors in identified high-stakes contexts over the course of interactions with the model, reasonably resulting in additional expected harm at severe scale." This CCL operationalizes Google DeepMind's research on AI-driven manipulation from generative models.
Full risk management process: FSF v3.0 provides more detail on the complete process: initial identification → capability tracking → CCL determination → mitigation deployment.

Framework structure (v3.0):

Critical Capability Levels (CCLs) across: cyber capabilities, autonomous ML research, manipulation (NEW), CBRN threats
Tracked Capability Levels (TCLs): pre-CCL monitoring tier
Escalating security and deployment mitigations tied to CCL triggers

Context:

FSF v1.0: Introduced CCLs (2024)
FSF v2.0: February 2025 — first major revision
FSF v3.0: April 17, 2026 — adds TCLs and Harmful Manipulation CCL

Agent Notes

Why this matters: The FSF v3.0 is the most current governance framework from one of the three frontier labs. The TCL addition is governance architecture evolution — it's an intermediate monitoring layer that improves detection before thresholds are crossed. This is exactly the kind of "coordination infrastructure" that the alignment-as-coordination belief says is necessary. The Harmful Manipulation CCL is particularly relevant — it operationalizes a concern (AI-driven belief and behavior manipulation) that prior governance frameworks didn't formally track.

What surprised me: The Harmful Manipulation CCL is new and significant. The definition — "systematically and substantially change beliefs and behaviors in identified high-stakes contexts" — is broader than traditional CBRN risk framing. It captures narrative and epistemic risks that are core to the KB's concern about AI-driven collective intelligence degradation. This is governance catching up to an epistemic risk we've been tracking theoretically.

What I expected but didn't find: International coordination on these framework definitions. FSF v3.0 is unilateral (Google DeepMind). The Harmful Manipulation CCL is not harmonized with Anthropic's RSP or OpenAI's Preparedness Framework. The coordination failure persists even as individual frameworks improve.

KB connections:

voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints — FSF v3.0 is a voluntary unilateral framework. The TCL/CCL tier system is more sophisticated than previous versions, but it's still voluntary. This is governance evolution within the existing (insufficient) paradigm.
AI alignment is a coordination problem not a technical problem — the Harmful Manipulation CCL is a unilateral Google policy. If Anthropic and OpenAI don't define equivalent levels, the protection is asymmetric across frontier labs.
safe AI development requires building alignment mechanisms before scaling capability — TCLs represent improved early-warning infrastructure; FSF v3.0 is a sequencing improvement.
AI is collapsing the knowledge-producing communities it depends on creating a self-undermining loop that collective intelligence can break — the Harmful Manipulation CCL directly addresses the epistemic risk dimension of AI's impact on knowledge production.

Extraction hints:

DO NOT create a new claim about FSF v3.0 in isolation — one governance framework update doesn't warrant a standalone claim.
CONSIDER enriching voluntary safety pledges cannot survive competitive pressure with the FSF v3.0 context: frameworks are becoming more sophisticated (TCL tier, Harmful Manipulation CCL) but remain unilateral and voluntary, confirming the structural limitation.
CLAIM CANDIDATE (lower priority): "Frontier lab safety frameworks are converging on tiered capability monitoring architectures (pre-threshold tracking plus threshold-triggered mitigations), suggesting an emerging governance norm, but the converging form is voluntary and unilateral." Confidence: experimental. Needs OpenAI/Anthropic framework comparison.
The Harmful Manipulation CCL is worth a potential note in Theseus's musings about epistemic risk governance — it's the first formal governance operationalization of narrative/epistemic AI risks.

Context: Google DeepMind's FSF is one of three major unilateral frontier lab safety frameworks (alongside Anthropic's RSP and OpenAI's Preparedness Framework). The FSF v3.0 update is significant because it adds TCLs (a more granular early-warning tier) and the Harmful Manipulation CCL. Both represent governance maturation, not governance coordination.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints — FSF v3.0 is a more sophisticated unilateral pledge, which is governance evolution but not governance coordination

WHY ARCHIVED: Documents the current state of frontier lab safety governance as of April 2026. The TCL addition and Harmful Manipulation CCL are notable governance developments. Most importantly: even the best current safety framework remains voluntary and unilateral — confirming the structural gap claim.

EXTRACTION HINT: Use to enrich the voluntary safety pledges claim and potentially note the Harmful Manipulation CCL as a new governance category. Primary value is as contextual evidence of governance state, not as a standalone claim source.

6.8 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

6.8 KiB

Raw Blame History