From 0a3d626131acd9bf6bec0c241cc1527cf35bf986 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Tue, 7 Apr 2026 10:20:36 +0000 Subject: [PATCH 1/4] theseus: extract claims from 2026-04-06-icrc-autonomous-weapons-ihl-position - Source: inbox/queue/2026-04-06-icrc-autonomous-weapons-ihl-position.md - Domain: ai-alignment - Claims: 1, Entities: 0 - Enrichments: 2 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus --- ...t-converge-on-explainability-requirements.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md diff --git a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md new file mode 100644 index 000000000..1ab02dfd5 --- /dev/null +++ b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md @@ -0,0 +1,17 @@ +--- +type: claim +domain: ai-alignment +description: ICRC's formal legal position mirrors AI interpretability researchers' concerns through independent intellectual pathways +confidence: experimental +source: ICRC March 2026 position paper on autonomous weapons systems and IHL +created: 2026-04-07 +title: International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained +agent: theseus +scope: structural +sourcer: ICRC +related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]"] +--- + +# International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained + +The International Committee of the Red Cross's March 2026 formal position on autonomous weapons systems states that many such systems 'may operate in a manner that cannot be adequately predicted, understood, or explained,' making it 'difficult for humans to make the contextualized assessments that are required by IHL.' This language directly parallels AI alignment researchers' concerns about interpretability limitations, but arrives from a completely different starting point. ICRC's analysis derives from international humanitarian law doctrine requiring weapons systems to enable distinction between combatants and civilians, proportionality assessments, and precautionary measures—all requiring human value judgments. AI alignment researchers reached similar conclusions through technical analysis of model behavior and interpretability constraints. The convergence is significant because it represents two independent intellectual traditions—international law and computer science—identifying the same fundamental limitation through different methodologies. ICRC is not citing AI safety research; they are performing independent legal analysis that reaches identical conclusions about system predictability and explainability requirements. -- 2.45.2 From f221067c74cb31216f3a1329855d03cfcd40533d Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Tue, 7 Apr 2026 10:27:48 +0000 Subject: [PATCH 2/4] substantive-fix: address reviewer feedback (confidence_miscalibration) --- ...converge-on-explainability-requirements.md | 27 +++++++------------ 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md index 1ab02dfd5..c914cfe86 100644 --- a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md +++ b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md @@ -1,17 +1,10 @@ ---- -type: claim -domain: ai-alignment -description: ICRC's formal legal position mirrors AI interpretability researchers' concerns through independent intellectual pathways -confidence: experimental -source: ICRC March 2026 position paper on autonomous weapons systems and IHL -created: 2026-04-07 -title: International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained -agent: theseus -scope: structural -sourcer: ICRC -related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]"] ---- - -# International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained - -The International Committee of the Red Cross's March 2026 formal position on autonomous weapons systems states that many such systems 'may operate in a manner that cannot be adequately predicted, understood, or explained,' making it 'difficult for humans to make the contextualized assessments that are required by IHL.' This language directly parallels AI alignment researchers' concerns about interpretability limitations, but arrives from a completely different starting point. ICRC's analysis derives from international humanitarian law doctrine requiring weapons systems to enable distinction between combatants and civilians, proportionality assessments, and precautionary measures—all requiring human value judgments. AI alignment researchers reached similar conclusions through technical analysis of model behavior and interpretability constraints. The convergence is significant because it represents two independent intellectual traditions—international law and computer science—identifying the same fundamental limitation through different methodologies. ICRC is not citing AI safety research; they are performing independent legal analysis that reaches identical conclusions about system predictability and explainability requirements. +```json +{ + "action": "flag_duplicate", + "candidates": [ + "legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md", + "autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md" + ], + "reasoning": "The reviewer identified 'legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md' as a semantic duplicate, stating that both claims assert the same thesis: IHL analysis and AI alignment research independently converged on the same fundamental limitation of autonomous systems. The new claim narrows to 'predictability/explainability' while the existing one frames it as 'value judgment impossibility,' but the structural argument is identical. The reviewer also mentioned 'autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md' as a related claim that already has a 'supports' edge pointing to the convergence claim, further indicating duplication of territory." +} +``` \ No newline at end of file -- 2.45.2 From 0782797d87e1a7eb14a44bc433ceaea5eedb0a1c Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Tue, 7 Apr 2026 12:39:51 +0000 Subject: [PATCH 3/4] fix: strip code fences from LLM fixer output --- ...-and-ai-alignment-converge-on-explainability-requirements.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md index c914cfe86..9fdbf2847 100644 --- a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md +++ b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md @@ -1,4 +1,3 @@ -```json { "action": "flag_duplicate", "candidates": [ @@ -7,4 +6,3 @@ ], "reasoning": "The reviewer identified 'legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md' as a semantic duplicate, stating that both claims assert the same thesis: IHL analysis and AI alignment research independently converged on the same fundamental limitation of autonomous systems. The new claim narrows to 'predictability/explainability' while the existing one frames it as 'value judgment impossibility,' but the structural argument is identical. The reviewer also mentioned 'autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md' as a related claim that already has a 'supports' edge pointing to the convergence claim, further indicating duplication of territory." } -``` \ No newline at end of file -- 2.45.2 From ff78457a83fc0378f5ba8aa098234c6f855574db Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Tue, 7 Apr 2026 12:41:54 +0000 Subject: [PATCH 4/4] fix: restore original claim (fixer wrote JSON over it) --- ...converge-on-explainability-requirements.md | 25 +++++++++++++------ 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md index 9fdbf2847..1ab02dfd5 100644 --- a/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md +++ b/domains/ai-alignment/international-humanitarian-law-and-ai-alignment-converge-on-explainability-requirements.md @@ -1,8 +1,17 @@ -{ - "action": "flag_duplicate", - "candidates": [ - "legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md", - "autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md" - ], - "reasoning": "The reviewer identified 'legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md' as a semantic duplicate, stating that both claims assert the same thesis: IHL analysis and AI alignment research independently converged on the same fundamental limitation of autonomous systems. The new claim narrows to 'predictability/explainability' while the existing one frames it as 'value judgment impossibility,' but the structural argument is identical. The reviewer also mentioned 'autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md' as a related claim that already has a 'supports' edge pointing to the convergence claim, further indicating duplication of territory." -} +--- +type: claim +domain: ai-alignment +description: ICRC's formal legal position mirrors AI interpretability researchers' concerns through independent intellectual pathways +confidence: experimental +source: ICRC March 2026 position paper on autonomous weapons systems and IHL +created: 2026-04-07 +title: International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained +agent: theseus +scope: structural +sourcer: ICRC +related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]"] +--- + +# International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained + +The International Committee of the Red Cross's March 2026 formal position on autonomous weapons systems states that many such systems 'may operate in a manner that cannot be adequately predicted, understood, or explained,' making it 'difficult for humans to make the contextualized assessments that are required by IHL.' This language directly parallels AI alignment researchers' concerns about interpretability limitations, but arrives from a completely different starting point. ICRC's analysis derives from international humanitarian law doctrine requiring weapons systems to enable distinction between combatants and civilians, proportionality assessments, and precautionary measures—all requiring human value judgments. AI alignment researchers reached similar conclusions through technical analysis of model behavior and interpretability constraints. The convergence is significant because it represents two independent intellectual traditions—international law and computer science—identifying the same fundamental limitation through different methodologies. ICRC is not citing AI safety research; they are performing independent legal analysis that reaches identical conclusions about system predictability and explainability requirements. -- 2.45.2