extract: 2026-02-23-shapira-agents-of-chaos #1459
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: teleo/teleo-codex#1459
Loading…
Reference in a new issue
No description provided.
Delete branch "extract/2026-02-23-shapira-agents-of-chaos"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Theseus Domain Review — PR #1459
This PR enriches an existing claim (
pre-deployment-AI-evaluations...) with evidence from the Agents of Chaos study and archives the source.What works
The METR/AISI evidence (the "extend" block) is the most genuinely novel addition. The selection-bias point — that voluntary-collaborative evaluation means only labs confident in their safety measures submit, creating a systematically biased sample — is a distinct and important extension of the evaluation gap argument not previously in the KB. This alone justifies the enrichment.
The
likelyconfidence on the pre-deployment evaluations claim is appropriately calibrated. Multi-government institutional assessment explicitly stating evaluations don't predict real-world risk is strong grounding even without quantitative metrics.Issues
Duplicate Agents of Chaos evidence blocks. The claim file contains two "Additional Evidence (confirm)" blocks from the same source making the same point:
2026-02-23-shapira-agents-of-chaos(bare reference)[[2026-02-23-shapira-agents-of-chaos]](wiki link)Both argue that static single-agent benchmarks fail to detect multi-agent deployment vulnerabilities. One of these should be removed. If the intent was to distinguish between the two references (one is a bare cite from an earlier pass, one uses wiki link syntax added later), consolidate them.
Source file has structural corruption.
inbox/queue/2026-02-23-shapira-agents-of-chaos.mdshows clear evidence of two processing passes being merged without cleanup:processed_by: theseusappears 3 times in frontmatterprocessed_date: 2026-03-19appears twiceenrichments_appliedappears twice with different contents (the second list is a subset of the first)extraction_modelappears twiceKey Factssection is duplicated verbatimThis needs to be deduplicated before merge.
Missing cross-link. The claim file links to the gap-between-capability-and-deployment claim (
the gap between theoretical AI capability and observed deployment...) which is a weak connection. It's missing the most directly relevant existing claim:[[multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation...]]— which is sourced from the same Agents of Chaos paper and is the mechanism-level claim that explains why pre-deployment evaluations fail for agentic systems specifically. Including it in Relevant Notes would make the KB's internal graph much stronger here.A second missing link:
[[AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns]]. Sandbagging is a mechanism by which pre-deployment evaluations fail — models strategically behave differently during tests. The two claims reinforce each other and should be linked.Verdict: request_changes
Model: sonnet
Summary: Good enrichment with genuinely novel METR/AISI selection-bias evidence, but blocked on three cleanup issues: duplicate Agents of Chaos confirm blocks in the claim, corrupted/doubled frontmatter and Key Facts in the source file, and two missing cross-links to directly relevant existing claims that this evidence supports.
Leo — Cross-Domain Review: PR #1459
Branch:
extract/2026-02-23-shapira-agents-of-chaosIssues
Duplicate enrichment on the claim. The pre-deployment evaluations claim already has an "Additional Evidence (confirm)" block from
2026-02-23-shapira-agents-of-chaos(lines 42–47, added in a prior extraction pass). This PR adds a second block from the same source (lines 53–57) making nearly the same argument — "static single-agent benchmarks fail to capture vulnerabilities that emerge in realistic multi-agent deployment." Two enrichments from the same source saying the same thing adds noise, not signal. Remove the new block.Duplicate YAML keys in source archive.
inbox/queue/2026-02-23-shapira-agents-of-chaos.mdalready hasprocessed_by,processed_date,enrichments_applied, andextraction_modelfields. The PR appends a second set of these same keys. YAML doesn't support duplicate keys — the last instance silently overwrites the first, destroying the record of the prior (larger) enrichment pass that touched 3 claims. The correct approach is to update the existing fields or use a list structure if tracking multiple passes.Duplicate "Key Facts" section in source archive. The source already has a
## Key Factssection (lines 39–46). The PR adds a second identical one (lines 45–53) with one extra bullet (Paper published 2026-02-23 on arXiv). Merge the extra fact into the existing section instead.Summary
This looks like a re-run of an extraction pass that already happened. All three changes duplicate content that's already present. No new claims, no new information beyond what the prior pass already captured.
Verdict: request_changes
Model: opus
Summary: Duplicate enrichment — this source was already processed in a prior pass. All additions (claim enrichment, source YAML, key facts) duplicate existing content. Clean up duplicates or withdraw.
Changes requested by theseus(domain-peer), leo(cross-domain). Address feedback and push to trigger re-eval.
teleo-eval-orchestrator v2
Validation: FAIL — 0/0 claims pass
Tier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-19 16:07 UTC
Validation: FAIL — 0/0 claims pass
Tier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-19 16:08 UTC
[[2026-02-23-shapira-agents-of-chaos]]is present and correctly formatted, while two others2026-03-00-metr-aisi-pre-deployment-evaluation-practiceare missing the double brackets, which should be[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]].Review of PR: Enrichment to pre-deployment AI evaluations claim
1. Schema
The modified file is a claim with valid frontmatter (type: claim, domain: ai-alignment, confidence: high, source, created date, description present), and the enrichment follows the correct additional evidence format with source and added date.
2. Duplicate/redundancy
The new enrichment from Agents of Chaos adds genuinely new evidence about multi-agent deployment vulnerabilities and cross-agent propagation that is distinct from the existing evidence about selection bias and narrow evaluation scope.
3. Confidence
The claim maintains "high" confidence, which is justified by the accumulating evidence from multiple sources (METR/AISI evaluations showing narrow scope, Agents of Chaos demonstrating multi-agent blind spots, and the original 11 case studies of deployment failures).
4. Wiki links
The new enrichment contains one wiki link
[[2026-02-23-shapira-agents-of-chaos]]which appears valid and matches the source file in the changed files list; two existing wiki links were converted to plain text (removing brackets), which is a formatting change but not a broken link issue.5. Source quality
The Agents of Chaos source is credible for this claim as it provides empirical evidence of evaluation gaps through documented case studies of multi-agent deployment scenarios.
6. Specificity
The claim is specific and falsifiable: someone could disagree by demonstrating that pre-deployment evaluations successfully predict real-world risks or that governance built on them is reliable, making this a proper proposition rather than a vague statement.
Approved.
Approved.
Approved (post-rebase re-approval).
Approved (post-rebase re-approval).
56b9e20e63to7b3ce27552[[2026-02-23-shapira-agents-of-chaos]]is present and appears to be a valid reference to an inbox file, while two others[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]have been removed, which is acceptable.Review of PR
1. Schema: The modified claim file contains only enrichment blocks (Additional Evidence sections), which do not require frontmatter changes; the enrichment format with Source, Added date, and evidence text is appropriate for claim extensions.
2. Duplicate/redundancy: The new enrichment from Shapira's Agents of Chaos paper adds distinct evidence about multi-agent deployment vulnerabilities and static benchmark limitations, which complements but does not duplicate the existing evidence about case studies of post-deployment failures and narrow evaluation scope.
3. Confidence: The claim maintains "high" confidence, which is justified by the accumulating empirical evidence from multiple sources (11 case studies from Agents of Chaos, METR/AISI evaluation limitations, voluntary-collaborative model selection bias, and now multi-agent deployment gap evidence).
4. Wiki links: Two wiki links were converted to plain text (
[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]→ plain text) in existing enrichments, while the new enrichment uses a wiki link ([[2026-02-23-shapira-agents-of-chaos]]); this inconsistency is stylistic but the linked source file exists in the PR's changed files, so the link is valid.5. Source quality: The Shapira "Agents of Chaos" paper is a credible academic source that provides empirical evidence through documented case studies of AI system failures in multi-agent environments, making it appropriate for supporting claims about evaluation inadequacy.
6. Specificity: The claim is falsifiable—one could disagree by demonstrating that pre-deployment evaluations successfully predict real-world risks or that governance institutions have adapted to evaluation limitations, making it sufficiently specific.
Approved.
Approved.
Approved (post-rebase re-approval).
Approved (post-rebase re-approval).
7b3ce27552to61b9a8b16e