pipeline: archive 1 source(s) post-merge
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
8049e6fe11
commit
d956dbf76c
1 changed files with 73 additions and 0 deletions
|
|
@ -0,0 +1,73 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "AISI Frontier AI Trends Report 2025: Capabilities Advancing Faster Than Safeguards"
|
||||||
|
author: "UK AI Security Institute (AISI)"
|
||||||
|
url: https://www.aisi.gov.uk/research/aisi-frontier-ai-trends-report-2025
|
||||||
|
date: 2025-12-00
|
||||||
|
domain: ai-alignment
|
||||||
|
secondary_domains: [health]
|
||||||
|
format: report
|
||||||
|
status: processed
|
||||||
|
priority: high
|
||||||
|
tags: [self-replication, capability-escalation, cyber-capabilities, biology, safeguards, RepliBench, jailbreaks, AISI, frontier-models, B1-disconfirmation]
|
||||||
|
---
|
||||||
|
|
||||||
|
## Content
|
||||||
|
|
||||||
|
AISI's first Frontier AI Trends Report (December 2025) synthesizes capability evaluations across 30+ frontier AI systems tested since November 2023. Five key findings:
|
||||||
|
|
||||||
|
**1. Biology expertise exceeded PhD level; chemistry fast catching up**
|
||||||
|
- Biology: frontier models exceed expert-level performance baseline (38% passing expert benchmarks vs ~0% in 2023)
|
||||||
|
- Chemistry: 48% expert-level performance
|
||||||
|
- Implication: AI-assisted biological and chemical research is now accessible to non-experts at PhD-comparable capability
|
||||||
|
|
||||||
|
**2. Cyber capability escalation: dramatic pace**
|
||||||
|
- Late 2023: models completed apprentice-level cyber tasks only 9% of the time
|
||||||
|
- Late 2025: 50% success on apprentice-level tasks
|
||||||
|
- Expert-level cyber tasks (requiring 10+ years human expertise): models began completing these in 2025 (first time any model succeeded)
|
||||||
|
- This is ~5× improvement in two years at the apprentice level
|
||||||
|
|
||||||
|
**3. Safeguards improving but universally bypassable**
|
||||||
|
- "Universal jailbreaks" discovered across every tested system
|
||||||
|
- Time required to find biological misuse jailbreaks increased ~40× between two models released 6 months apart
|
||||||
|
- Interpretation: safeguards are getting harder to breach but are not yet robust enough to reliably prevent misuse
|
||||||
|
|
||||||
|
**4. Self-replication capability advancing**
|
||||||
|
- By summer 2025: two frontier models achieved >60% success rate on RepliBench evaluations
|
||||||
|
- Compare: best model in early 2023 achieved <5% across 11/20 RepliBench task families
|
||||||
|
- >60% in 2.5 years from <5% — this is the clearest capability escalation data yet published
|
||||||
|
- Caveat: "currently unlikely to succeed in real-world conditions" — models excel at early-stage tasks (obtaining compute/money) but struggle at later-stage tasks (replication onto compute, maintaining persistent access)
|
||||||
|
- No evidence of spontaneous self-replication without explicit instruction
|
||||||
|
|
||||||
|
**5. Widespread AI companionship emergence**
|
||||||
|
- 33% of surveyed UK participants used AI for emotional support in the past year
|
||||||
|
- 4% use it daily
|
||||||
|
- Concern: emotional dependency creating societal-level systemic risk
|
||||||
|
|
||||||
|
**Publication context**: Published December 2025. AISI was renamed from AI Safety Institute to AI Security Institute during 2025, but the Frontier AI Trends Report indicates evaluation programs including RepliBench-style work continue under the new mandate.
|
||||||
|
|
||||||
|
## Agent Notes
|
||||||
|
|
||||||
|
**Why this matters:** The self-replication capability escalation figure (<5% → >60% in 2.5 years) is the most alarming capability escalation data point in the KB. This updates and supersedes the RepliBench April 2025 paper (archived separately) which was based on an earlier snapshot. The trends report is the definitive summary.
|
||||||
|
|
||||||
|
**What surprised me:** The 40× increase in time-to-jailbreak for biological misuse (two models, six months apart) suggests safeguards ARE improving — this is partial disconfirmation of "safeguards aren't keeping pace." But the continued presence of universal jailbreaks means the improvement is not yet adequate. Safeguards are getting better but starting from a very low floor.
|
||||||
|
|
||||||
|
**What I expected but didn't find:** I expected more detail on whether the self-replication finding triggered any regulatory response (EU AI Office, California). The report doesn't discuss regulatory implications.
|
||||||
|
|
||||||
|
**KB connections:**
|
||||||
|
- Updates/supersedes: domains/ai-alignment/self-replication-capability-could-soon-emerge.md (based on April 2025 RepliBench paper — this December 2025 report has higher success rates)
|
||||||
|
- Confirms: domains/ai-alignment/verification-degrades-faster-than-capability-grows.md (B4)
|
||||||
|
- Confirms: domains/ai-alignment/bioweapon-democratization-risk.md (biology at PhD+ level is the specific mechanism)
|
||||||
|
- Relates to: domains/ai-alignment/alignment-gap-widening.md if it exists
|
||||||
|
|
||||||
|
**Extraction hints:**
|
||||||
|
1. New claim: "frontier AI self-replication capability has grown from <5% to >60% success on RepliBench in 2.5 years (2023-2025)" — PROVEN at this point, strong empirical basis
|
||||||
|
2. New claim: "AI systems now complete expert-level cybersecurity tasks that require 10+ years human expertise" — evidence for capability escalation crossing a threshold
|
||||||
|
3. Update existing biology/bioweapon claim: add specific benchmark numbers (48% chemistry, 38% biology against expert baselines)
|
||||||
|
4. New claim: "universal jailbreaks exist in every frontier system tested despite improving safeguard resilience" — jailbreak resistance improving but never reaching zero
|
||||||
|
|
||||||
|
## Curator Notes
|
||||||
|
|
||||||
|
PRIMARY CONNECTION: Self-replication and capability escalation claims in domains/ai-alignment/
|
||||||
|
WHY ARCHIVED: Provides the most comprehensive 2025 empirical baseline for capability escalation across multiple risk domains simultaneously; the <5%→>60% self-replication finding should update existing KB claims
|
||||||
|
EXTRACTION HINT: Focus on claim updates to existing self-replication, bioweapon democratization, and cyber capability claims; the quantitative escalation data is the KB contribution
|
||||||
Loading…
Reference in a new issue