entity-batch: update 1 entities
- Applied 1 entity operations from queue - Files: entities/ai-alignment/anthropic.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
parent
290a0160ae
commit
4e0c6589c9
1 changed files with 1 additions and 0 deletions
|
|
@ -61,6 +61,7 @@ Frontier AI safety laboratory founded by former OpenAI VP of Research Dario Amod
|
||||||
- **2025-08-01** — Published persona vectors research demonstrating activation-based monitoring of behavioral traits (sycophancy, hallucination) in small open-source models (Qwen 2.5-7B, Llama-3.1-8B), with 'preventative steering' capability that reduces harmful trait acquisition during training without capability degradation. Not validated on Claude or for safety-critical behaviors.
|
- **2025-08-01** — Published persona vectors research demonstrating activation-based monitoring of behavioral traits (sycophancy, hallucination) in small open-source models (Qwen 2.5-7B, Llama-3.1-8B), with 'preventative steering' capability that reduces harmful trait acquisition during training without capability degradation. Not validated on Claude or for safety-critical behaviors.
|
||||||
- **2026-02-24** — Published RSP v3.0, replacing hard capability-threshold pause triggers with Frontier Safety Roadmap containing dated commitments through July 2027; extended evaluation interval from 3 to 6 months; published redacted February 2026 Risk Report
|
- **2026-02-24** — Published RSP v3.0, replacing hard capability-threshold pause triggers with Frontier Safety Roadmap containing dated commitments through July 2027; extended evaluation interval from 3 to 6 months; published redacted February 2026 Risk Report
|
||||||
- **2026-02-24** — Published RSP v3.0, replacing hard capability-threshold pause triggers with Frontier Safety Roadmap containing dated milestones through July 2027; extended evaluation interval from 3 to 6 months; disaggregated AI R&D threshold into two distinct capability levels
|
- **2026-02-24** — Published RSP v3.0, replacing hard capability-threshold pause triggers with Frontier Safety Roadmap containing dated milestones through July 2027; extended evaluation interval from 3 to 6 months; disaggregated AI R&D threshold into two distinct capability levels
|
||||||
|
- **2025-05-01** — Activated ASL-3 protections for Claude Opus 4 as precautionary measure without confirmed threshold crossing, citing evaluation unreliability and upward trend in CBRN capability assessments
|
||||||
## Competitive Position
|
## Competitive Position
|
||||||
Strongest position in enterprise AI and coding. Revenue growth (10x YoY) outpaces all competitors. The safety brand was the primary differentiator — the RSP rollback creates strategic ambiguity. CEO publicly uncomfortable with power concentration while racing to concentrate it.
|
Strongest position in enterprise AI and coding. Revenue growth (10x YoY) outpaces all competitors. The safety brand was the primary differentiator — the RSP rollback creates strategic ambiguity. CEO publicly uncomfortable with power concentration while racing to concentrate it.
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue