From 2bac07b4ab1946fa3b0e6e40ae86d8279fdce797 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Wed, 11 Mar 2026 04:30:10 +0000 Subject: [PATCH] theseus: extract claims from 2025-00-00-mats-ai-agent-index-2025.md - Source: inbox/archive/2025-00-00-mats-ai-agent-index-2025.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 5) Pentagon-Agent: Theseus --- ...m-not-exception-across-deployed-systems.md | 29 +++++++++++++++++ ...ity-advancement-creating-deployment-gap.md | 31 +++++++++++++++++++ ... systems regardless of agent capability.md | 6 ++++ ...cost that competitive markets eliminate.md | 6 ++++ ...ity limits determines real-world impact.md | 6 ++++ ... advance without equivalent constraints.md | 6 ++++ .../2025-00-00-mats-ai-agent-index-2025.md | 14 ++++++++- 7 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 domains/ai-alignment/agent-safety-documentation-gap-is-norm-not-exception-across-deployed-systems.md create mode 100644 domains/ai-alignment/ai-agent-safety-documentation-lags-capability-advancement-creating-deployment-gap.md diff --git a/domains/ai-alignment/agent-safety-documentation-gap-is-norm-not-exception-across-deployed-systems.md b/domains/ai-alignment/agent-safety-documentation-gap-is-norm-not-exception-across-deployed-systems.md new file mode 100644 index 000000000..e01ffec3b --- /dev/null +++ b/domains/ai-alignment/agent-safety-documentation-gap-is-norm-not-exception-across-deployed-systems.md @@ -0,0 +1,29 @@ +--- +type: claim +domain: ai-alignment +description: "30-agent survey shows minimal safety documentation is standard practice not outlier behavior among agent developers" +confidence: experimental +source: "MATS Research, The 2025 AI Agent Index (2025-01-01)" +created: 2025-01-15 +depends_on: ["ai-agent-safety-documentation-lags-capability-advancement-creating-deployment-gap.md"] +challenged_by: [] +--- + +# Minimal safety documentation is standard practice across deployed AI agent systems + +The MATS 2025 AI Agent Index surveyed 30 state-of-the-art AI agents and found that "most developers share little information about safety, evaluations, and societal impacts." This is not a fringe problem — the survey covered state-of-the-art agents, and the documentation gap was pervasive. + +The breadth of the gap is the key finding: across 30 agents representing the current deployment frontier, minimal safety documentation is standard practice. Different transparency levels exist among developers, but the overall pattern is one of "inconsistently documented" systems where safety information is the exception rather than the rule. + +This normalization of the documentation gap has structural implications. When minimal safety disclosure is standard practice, competitive pressure reinforces rather than corrects the pattern. Developers who invest in comprehensive safety documentation bear costs that competitors skip, creating a race-to-the-bottom dynamic. + +The survey methodology — publicly available information plus developer correspondence — means the gap persists even when researchers actively seek safety information. This is not information that exists but is hard to find; it is information that largely does not exist in accessible form. + +--- + +**Relevant connections:** +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — when 30 agents show minimal documentation, unilateral safety investment is structurally punished +- [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — the agent documentation gap exemplifies the capability-governance mismatch + +**Topics:** +- [[domains/ai-alignment/_map]] diff --git a/domains/ai-alignment/ai-agent-safety-documentation-lags-capability-advancement-creating-deployment-gap.md b/domains/ai-alignment/ai-agent-safety-documentation-lags-capability-advancement-creating-deployment-gap.md new file mode 100644 index 000000000..eb0c9a4af --- /dev/null +++ b/domains/ai-alignment/ai-agent-safety-documentation-lags-capability-advancement-creating-deployment-gap.md @@ -0,0 +1,31 @@ +--- +type: claim +domain: ai-alignment +description: "Survey of 30 deployed agents shows most developers share minimal safety information despite autonomous action capabilities" +confidence: experimental +source: "MATS Research, The 2025 AI Agent Index (2025-01-01)" +created: 2025-01-15 +depends_on: [] +challenged_by: [] +--- + +# AI agent safety documentation lags significantly behind agent capability advancement in deployed autonomous systems + +The MATS 2025 AI Agent Index surveyed 30 state-of-the-art AI agents through publicly available information and developer correspondence, finding that "most developers share little information about safety, evaluations, and societal impacts." This documentation gap is particularly concerning because these are deployed systems with "growing deployment of agents for professional and personal tasks with limited human involvement" — agents that take autonomous actions in the world without standardized safety assessments. + +The transparency problem is systemic: different developers show inconsistent disclosure practices across the ecosystem. The index characterizes the AI agent ecosystem as "complex, rapidly evolving, and inconsistently documented, posing obstacles to both researchers and policymakers." + +This represents an extension of the alignment gap from base models to agentic systems. While models generate outputs for human review, agents take autonomous actions with higher real-world stakes. The safety documentation deficit is worse for agents than for the underlying models, creating a compounding risk as deployment accelerates. + +The gap reflects the absence of standardized frameworks for what agent safety documentation should contain. The index documents the problem but does not propose standards, suggesting the field lacks consensus on safety requirements for autonomous systems. + +--- + +**Relevant connections:** +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — the documentation gap reflects competitive pressure to deploy without safety overhead +- [[coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability]] — the institutional version of the accountability gap +- [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]] — agents with "limited human involvement" are the deployment manifestation +- [[the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact]] — for agents, the gap extends to safety practices + +**Topics:** +- [[domains/ai-alignment/_map]] diff --git a/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md b/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md index b95965bb9..6038e938f 100644 --- a/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md +++ b/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md @@ -19,6 +19,12 @@ His practical reframing helps: "At this point maybe we treat coding agents like This connects directly to [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]]. The accountability gap creates a structural tension: markets incentivize removing humans from the loop (because human review slows deployment), but removing humans from security-critical decisions transfers unmanageable risk. The resolution requires accountability mechanisms that don't depend on human speed — which points toward [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]]. + +### Additional Evidence (extend) +*Source: [[2025-00-00-mats-ai-agent-index-2025]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5* + +The MATS 2025 AI Agent Index documents 'growing deployment of agents for professional and personal tasks with limited human involvement' without standardized safety assessments. This is the institutional manifestation of the accountability gap: agents are being deployed to take autonomous actions in professional contexts (where accountability matters) while 'most developers share little information about safety, evaluations, and societal impacts.' The gap between autonomous action capability and safety documentation creates a systemic accountability void — agents act, but the safety frameworks that would enable accountability are largely absent. + --- Relevant Notes: diff --git a/domains/ai-alignment/economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate.md b/domains/ai-alignment/economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate.md index e30d7c4bd..02cd0978a 100644 --- a/domains/ai-alignment/economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate.md +++ b/domains/ai-alignment/economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate.md @@ -19,6 +19,12 @@ The alignment implications are severe. Human-in-the-loop is the default safety a This creates a structural inversion: the market preserves human-in-the-loop exactly where it's least useful (unverifiable domains where humans can't easily evaluate AI output either) and removes it exactly where it's most useful (verifiable domains where bad outputs are detectable but only if someone is looking). + +### Additional Evidence (confirm) +*Source: [[2025-00-00-mats-ai-agent-index-2025]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5* + +The MATS 2025 AI Agent Index documents 'growing deployment of agents for professional and personal tasks with limited human involvement' across 30 state-of-the-art systems. This is the deployment manifestation of the economic force pushing humans out of cognitive loops: agents are being deployed with 'limited human involvement' precisely because human oversight is a cost. The survey found this pattern across professional and personal task domains, suggesting the economic pressure is domain-general. The fact that this is happening despite 'most developers shar[ing] little information about safety, evaluations, and societal impacts' confirms that competitive pressure dominates safety considerations in deployment decisions. + --- Relevant Notes: diff --git a/domains/ai-alignment/the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact.md b/domains/ai-alignment/the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact.md index 44ff4b607..15a485226 100644 --- a/domains/ai-alignment/the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact.md +++ b/domains/ai-alignment/the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact.md @@ -27,6 +27,12 @@ The gap is not about what AI can't do — it's about what organizations haven't This reframes the alignment timeline question. The capability for massive labor market disruption already exists. The question isn't "when will AI be capable enough?" but "when will adoption catch up to capability?" That's an organizational and institutional question, not a technical one. + +### Additional Evidence (extend) +*Source: [[2025-00-00-mats-ai-agent-index-2025]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5* + +The MATS 2025 AI Agent Index shows that for AI agents, the capability-deployment gap extends to safety practices, not just adoption. The survey found 'safety documentation lags significantly behind capability advancement in deployed agent systems' — these are deployed systems (not theoretical), but the safety infrastructure lags the capability. This suggests a multi-dimensional gap: not only does deployment lag capability, but safety documentation lags both. The index characterizes the ecosystem as 'complex, rapidly evolving, and inconsistently documented, posing obstacles to both researchers and policymakers,' indicating that the documentation gap creates its own adoption friction for researchers and regulators trying to understand deployed systems. + --- Relevant Notes: diff --git a/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md b/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md index 2182d95ed..565dbcbaa 100644 --- a/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md +++ b/domains/ai-alignment/voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md @@ -27,6 +27,12 @@ The timing is revealing: Anthropic dropped its safety pledge the same week the P Anthropic, widely considered the most safety-focused frontier AI lab, rolled back its Responsible Scaling Policy (RSP) in February 2026. The original 2023 RSP committed to never training an AI system unless the company could guarantee in advance that safety measures were adequate. The new RSP explicitly acknowledges the structural dynamic: safety work 'requires collaboration (and in some cases sacrifices) from multiple parts of the company and can be at cross-purposes with immediate competitive and commercial priorities.' This represents the highest-profile case of a voluntary AI safety commitment collapsing under competitive pressure. Anthropic's own language confirms the mechanism: safety is a competitive cost ('sacrifices') that conflicts with commercial imperatives ('at cross-purposes'). Notably, no alternative coordination mechanism was proposed—they weakened the commitment without proposing what would make it sustainable (industry-wide agreements, regulatory requirements, market mechanisms). This is particularly significant because Anthropic is the organization most publicly committed to safety governance, making their rollback empirical validation that even safety-prioritizing institutions cannot sustain unilateral commitments under competitive pressure. + +### Additional Evidence (extend) +*Source: [[2025-00-00-mats-ai-agent-index-2025]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5* + +The MATS 2025 AI Agent Index surveyed 30 state-of-the-art deployed AI agents and found that 'most developers share little information about safety, evaluations, and societal impacts.' The documentation gap is pervasive across the ecosystem, not limited to outliers. This extends the voluntary safety pledge problem from capability commitments to documentation practices: when minimal safety disclosure is the norm across 30 deployed systems, developers who invest in comprehensive documentation bear costs that competitors skip. The survey found 'inconsistently documented' systems across the board, suggesting that competitive pressure has normalized minimal safety transparency as standard practice. + --- Relevant Notes: diff --git a/inbox/archive/2025-00-00-mats-ai-agent-index-2025.md b/inbox/archive/2025-00-00-mats-ai-agent-index-2025.md index a6df28ea9..0778b2628 100644 --- a/inbox/archive/2025-00-00-mats-ai-agent-index-2025.md +++ b/inbox/archive/2025-00-00-mats-ai-agent-index-2025.md @@ -7,9 +7,15 @@ date: 2025-01-01 domain: ai-alignment secondary_domains: [] format: report -status: unprocessed +status: processed priority: medium tags: [AI-agents, safety-documentation, transparency, deployment, agentic-AI] +processed_by: theseus +processed_date: 2025-01-01 +claims_extracted: ["ai-agent-safety-documentation-lags-capability-advancement-creating-deployment-gap.md", "agent-safety-documentation-gap-is-norm-not-exception-across-deployed-systems.md"] +enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md", "economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate.md", "the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact.md"] +extraction_model: "anthropic/claude-sonnet-4.5" +extraction_notes: "Extracted two claims documenting the agent-specific safety documentation gap. The key insight is that minimal safety documentation is the NORM across 30 deployed agents, not an exception. This extends the alignment gap from models to agents and connects to competitive pressure dynamics. Four enrichments added to existing claims about voluntary safety commitments, accountability gaps, economic forces removing humans from loops, and capability-deployment gaps. The source does not propose standards for what agent safety documentation should contain, only documents the gap." --- ## Content @@ -43,3 +49,9 @@ Key findings: PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] WHY ARCHIVED: Documents the agent-specific safety gap — agents act autonomously but have even less safety documentation than base models EXTRACTION HINT: The key finding is the NORM of minimal safety documentation across 30 deployed agents. This extends the alignment gap from models to agents. + + +## Key Facts +- MATS 2025 AI Agent Index surveyed 30 state-of-the-art AI agents +- Survey methodology: publicly available information + developer correspondence +- MATS (ML Alignment Theory Scholars) is a leading alignment research training program -- 2.45.2