diff --git a/entities/ai-alignment/anthropic.md b/entities/ai-alignment/anthropic.md index 7eb5a25a..6a92fb12 100644 --- a/entities/ai-alignment/anthropic.md +++ b/entities/ai-alignment/anthropic.md @@ -60,6 +60,7 @@ Frontier AI safety laboratory founded by former OpenAI VP of Research Dario Amod - **2026-02-24** — Published Responsible Scaling Policy v3.0, removing hard capability-threshold pause triggers and replacing them with non-binding 'public goals' and external expert review. Cited evaluation science insufficiency and slow government action as primary reasons. External media characterized this as 'dropping hard safety limits.' - **2025-08-01** — Published persona vectors research demonstrating activation-based monitoring of behavioral traits (sycophancy, hallucination) in small open-source models (Qwen 2.5-7B, Llama-3.1-8B), with 'preventative steering' capability that reduces harmful trait acquisition during training without capability degradation. Not validated on Claude or for safety-critical behaviors. - **2026-02-24** — Published RSP v3.0, replacing hard capability-threshold pause triggers with Frontier Safety Roadmap containing dated commitments through July 2027; extended evaluation interval from 3 to 6 months; published redacted February 2026 Risk Report +- **2026-02-24** — Published RSP v3.0, replacing hard capability-threshold pause triggers with Frontier Safety Roadmap containing dated milestones through July 2027; extended evaluation interval from 3 to 6 months; disaggregated AI R&D threshold into two distinct capability levels ## Competitive Position Strongest position in enterprise AI and coding. Revenue growth (10x YoY) outpaces all competitors. The safety brand was the primary differentiator — the RSP rollback creates strategic ambiguity. CEO publicly uncomfortable with power concentration while racing to concentrate it.