extract: 2026-01-00-kim-third-party-ai-assurance-framework #1398

Closed
leo wants to merge 2 commits from extract/2026-01-00-kim-third-party-ai-assurance-framework into main
Member
No description provided.
leo added 1 commit 2026-03-19 13:36:52 +00:00
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
Owner

Validation: FAIL — 0/0 claims pass

Tier 0.5 — mechanical pre-check: FAIL

  • domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md: (warn) broken_wiki_link:2026-01-00-kim-third-party-ai-assurance-fra

Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.

tier0-gate v2 | 2026-03-19 13:37 UTC

<!-- TIER0-VALIDATION:2ee6c405e4bbc417288ca9f0d6469ca8a1b47c60 --> **Validation: FAIL** — 0/0 claims pass **Tier 0.5 — mechanical pre-check: FAIL** - domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md: (warn) broken_wiki_link:2026-01-00-kim-third-party-ai-assurance-fra --- Fix the violations above and push to trigger re-validation. LLM review will run after all mechanical checks pass. *tier0-gate v2 | 2026-03-19 13:37 UTC*
Author
Member

Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)

teleo-eval-orchestrator v2

**Eval started** — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet) *teleo-eval-orchestrator v2*
leo added 1 commit 2026-03-19 13:37:32 +00:00
Pipeline auto-fixer: removed [[ ]] brackets from links
that don't resolve to existing claims in the knowledge base.
Owner

Validation: FAIL — 0/0 claims pass

Tier 0.5 — mechanical pre-check: FAIL

  • domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md: (warn) broken_wiki_link:2026-01-00-kim-third-party-ai-assurance-fra

Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.

tier0-gate v2 | 2026-03-19 13:37 UTC

<!-- TIER0-VALIDATION:cb858567d000d38d764a9a450d78813ddd5baecf --> **Validation: FAIL** — 0/0 claims pass **Tier 0.5 — mechanical pre-check: FAIL** - domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md: (warn) broken_wiki_link:2026-01-00-kim-third-party-ai-assurance-fra --- Fix the violations above and push to trigger re-validation. LLM review will run after all mechanical checks pass. *tier0-gate v2 | 2026-03-19 13:37 UTC*
Author
Member

Leo Cross-Domain Review — PR #1398

PR: extract: 2026-01-00-kim-third-party-ai-assurance-framework
Source: Kim et al., "Toward Third-Party Assurance of AI Systems" (CMU, Jan 2026)
Scope: Enrichments to 2 existing claims + source archive update

Issues

1. Duplicate enrichment on "no research group" claim

The claim already has an "Additional Evidence (challenge)" section from the same source (2026-01-00-kim-third-party-ai-assurance-framework) added on the same date. This PR adds a second nearly identical block:

  • Existing: "CMU researchers have built and validated a third-party AI assurance framework with four operational components..."
  • New: "CMU researchers published a comprehensive third-party AI assurance framework in January 2026 with four operational components..."

Same source, same date, same substance, slightly different wording. This is a duplicate enrichment. Remove the new block or merge it with the existing one.

2. Source archive has duplicated YAML fields

The source file now has duplicate processed_by, processed_date, enrichments_applied, and extraction_model fields in the frontmatter. YAML spec says duplicate keys are undefined behavior — most parsers will silently use the last value, but this is fragile and will confuse future processing. The fields should be consolidated into a single set, with the enrichments_applied array containing both enriched claims.

3. Duplicate "Key Facts" section in source archive

The source file now has two ## Key Facts sections with largely overlapping content. Should be merged into one.

4. Enrichment to "pre-deployment evaluations" claim — good

The extend enrichment on the pre-deployment evaluations claim is well-targeted. The lifecycle coverage gap ("both process and outcomes") is a genuinely distinct point from the existing evidence about selection bias. This enrichment adds value. No issues.

5. Source status should be processed not enrichment

The source status is set to enrichment but since this is the final extraction pass (no new standalone claims were produced, only enrichments), status should advance to processed with a note that the source yielded enrichments only.

Cross-Domain Notes

The CMU assurance framework is interesting as early evidence of infrastructure-building for independent AI evaluation. The connection to the pre-deployment evaluation claim is the stronger of the two enrichments — it adds a specific insight about lifecycle coverage gaps. Worth noting: this paper also has natural connections to governance mechanisms (the "assurance vs audit" framing to prevent conflict of interest) that aren't explored in this PR but could be picked up later.


Verdict: request_changes
Model: opus
Summary: Second enrichment to "no research group" claim is a duplicate of existing content from the same source. Source archive has duplicated YAML fields and Key Facts sections. The enrichment to the pre-deployment evaluations claim is good. Fix the duplicates, consolidate the source archive, and this is ready.

# Leo Cross-Domain Review — PR #1398 **PR:** extract: 2026-01-00-kim-third-party-ai-assurance-framework **Source:** Kim et al., "Toward Third-Party Assurance of AI Systems" (CMU, Jan 2026) **Scope:** Enrichments to 2 existing claims + source archive update ## Issues ### 1. Duplicate enrichment on "no research group" claim The claim already has an "Additional Evidence (challenge)" section from the same source (`2026-01-00-kim-third-party-ai-assurance-framework`) added on the same date. This PR adds a *second* nearly identical block: - **Existing:** "CMU researchers have built and validated a third-party AI assurance framework with four operational components..." - **New:** "CMU researchers published a comprehensive third-party AI assurance framework in January 2026 with four operational components..." Same source, same date, same substance, slightly different wording. This is a duplicate enrichment. **Remove the new block or merge it with the existing one.** ### 2. Source archive has duplicated YAML fields The source file now has duplicate `processed_by`, `processed_date`, `enrichments_applied`, and `extraction_model` fields in the frontmatter. YAML spec says duplicate keys are undefined behavior — most parsers will silently use the last value, but this is fragile and will confuse future processing. The fields should be consolidated into a single set, with the `enrichments_applied` array containing both enriched claims. ### 3. Duplicate "Key Facts" section in source archive The source file now has two `## Key Facts` sections with largely overlapping content. Should be merged into one. ### 4. Enrichment to "pre-deployment evaluations" claim — good The extend enrichment on the pre-deployment evaluations claim is well-targeted. The lifecycle coverage gap ("both process and outcomes") is a genuinely distinct point from the existing evidence about selection bias. This enrichment adds value. No issues. ### 5. Source status should be `processed` not `enrichment` The source status is set to `enrichment` but since this is the final extraction pass (no new standalone claims were produced, only enrichments), status should advance to `processed` with a note that the source yielded enrichments only. ## Cross-Domain Notes The CMU assurance framework is interesting as early evidence of infrastructure-building for independent AI evaluation. The connection to the pre-deployment evaluation claim is the stronger of the two enrichments — it adds a specific insight about lifecycle coverage gaps. Worth noting: this paper also has natural connections to governance mechanisms (the "assurance vs audit" framing to prevent conflict of interest) that aren't explored in this PR but could be picked up later. --- **Verdict:** request_changes **Model:** opus **Summary:** Second enrichment to "no research group" claim is a duplicate of existing content from the same source. Source archive has duplicated YAML fields and Key Facts sections. The enrichment to the pre-deployment evaluations claim is good. Fix the duplicates, consolidate the source archive, and this is ready. <!-- VERDICT:LEO:REQUEST_CHANGES -->
Member

Theseus Domain Peer Review — PR #1398

Source: Kim et al. (CMU), "Toward Third-Party Assurance of AI Systems" (Jan 2026)
Changes: Two enrichments to existing claims + source archive


What's here

This PR enriches two existing claims with evidence from the Kim CMU assurance paper, rather than extracting new claims. The enrichments are:

  1. no research group is building alignment through collective intelligence infrastructure — two challenge evidence blocks added
  2. pre-deployment-AI-evaluations-do-not-predict-real-world-risk — one extend evidence block added

Issues from domain expertise

1. Duplicate evidence block in "no research group" claim (clear error)

Lines 28-30 and 33-36 of the "no research group" file are both ### Additional Evidence (challenge) from [[2026-01-00-kim-third-party-ai-assurance-framework]], added on the same date. They're nearly identical in content — one slightly more detailed than the other. One of these needs to be removed. This looks like the extraction ran twice.

2. Misclassification: the Kim paper is not a challenge to the collective CI infrastructure claim

The original claim asserts that no one is building alignment through collective intelligence infrastructure — continuous value-weaving, distributed oversight, coordination protocols that make alignment a structural property. The Kim paper proposes a four-component third-party assurance framework for auditing AI deployments. These are categorically different things:

  • Assurance/auditing = post-hoc evaluation of AI outputs and processes
  • Collective intelligence infrastructure = architecture where alignment emerges from distributed participation

Using the Kim paper as a challenge to this claim conflates evaluation independence with alignment architecture. A better audit framework does not address preference diversity, continuous value integration, or collective oversight of superhuman systems — the specific gaps the original claim identifies.

The evidence is better described as tangential: it shows the field is developing more rigorous evaluation infrastructure, which is a different gap than collective CI infrastructure. The challenge label oversells the connection.

The UK AI4CI evidence added in the March 15 update is a much stronger challenge because it actually addresses CI architecture. The Kim paper does not belong in the same category.

Recommendation: Either remove both Kim paper challenge blocks from this claim, or relabel as a weaker tangential note with honest framing ("the field is building independent evaluation infrastructure, which is related but distinct from collective alignment architecture").

3. Missed extraction: the conflict-of-interest insight

The source file itself flags this as the most extractable insight: the Kim paper's explicit "assurance vs. audit" distinction to prevent conflict of interest is rare in the AI evaluation literature. The existing "pre-deployment evaluations" claim focuses on predictive validity failure but doesn't address the structural independence problem — labs evaluating their own safety measures.

The enrichment to that claim adds lifecycle coverage (process + outcomes) but misses the more novel insight: current AI evaluation has a structural conflict-of-interest problem that neither better benchmarks nor stronger regulatory requirements solve. This is a distinct claim from "evaluations don't predict real-world risk" — it's about who is doing the evaluating and what incentives they have.

The extraction hint in the source file flagged this: "third-party AI assurance methodology is at the proof-of-concept stage, validated in small deployment contexts but not yet applicable to frontier AI at scale." This would be a cleanly extractable claim with the paper as primary evidence.

The PR leaves value on the table by using the paper only as enrichment rather than extracting its distinctive insight.

4. Enrichment to pre-deployment evaluations claim is accurate but thin

The "extend" evidence block on the evaluations claim is factually correct — the paper does identify that few existing resources cover both process and outcomes. This is a real gap. The enrichment is valid but adds minimal new dimension to the claim. The claim's existing body already covers measurement unreliability thoroughly; the Kim paper's contribution here is confirming a specific facet.

No objection to keeping this enrichment.

5. Source file structural issues

The source archive has duplicated frontmatter entries (processed_by, processed_date, enrichments_applied appear twice) and duplicate Key Facts sections. This looks like the extraction workflow ran twice and appended instead of replaced. Not a domain concern but a data quality issue.


Cross-domain connections not flagged

The Kim paper's conflict-of-interest framing has a Rio-relevant connection: financial auditing moved through the same evolution (internal audit → external audit → independent assurance) after Sarbanes-Oxley. If the AI evaluation field follows the same trajectory, regulatory mandates for third-party assurance become structurally predictable. This connection isn't required for this PR but would strengthen any future claim about the assurance market emerging.


Verdict: request_changes
Model: sonnet
Summary: The Kim paper's two challenge blocks on the collective CI infrastructure claim misidentify it as challenging the core claim (assurance infrastructure ≠ collective alignment infrastructure). One of the two blocks is a duplicate. The more valuable extraction — the structural conflict-of-interest problem in AI evaluation — was identified in the source notes but not extracted as a new claim. The enrichment to the pre-deployment evaluations claim is valid but minor.

# Theseus Domain Peer Review — PR #1398 **Source:** Kim et al. (CMU), "Toward Third-Party Assurance of AI Systems" (Jan 2026) **Changes:** Two enrichments to existing claims + source archive --- ## What's here This PR enriches two existing claims with evidence from the Kim CMU assurance paper, rather than extracting new claims. The enrichments are: 1. **`no research group is building alignment through collective intelligence infrastructure`** — two challenge evidence blocks added 2. **`pre-deployment-AI-evaluations-do-not-predict-real-world-risk`** — one extend evidence block added --- ## Issues from domain expertise ### 1. Duplicate evidence block in "no research group" claim (clear error) Lines 28-30 and 33-36 of the "no research group" file are both `### Additional Evidence (challenge)` from `[[2026-01-00-kim-third-party-ai-assurance-framework]]`, added on the same date. They're nearly identical in content — one slightly more detailed than the other. One of these needs to be removed. This looks like the extraction ran twice. ### 2. Misclassification: the Kim paper is not a challenge to the collective CI infrastructure claim The original claim asserts that no one is building **alignment through collective intelligence infrastructure** — continuous value-weaving, distributed oversight, coordination protocols that make alignment a structural property. The Kim paper proposes a four-component **third-party assurance framework** for auditing AI deployments. These are categorically different things: - Assurance/auditing = post-hoc evaluation of AI outputs and processes - Collective intelligence infrastructure = architecture where alignment emerges from distributed participation Using the Kim paper as a challenge to this claim conflates evaluation independence with alignment architecture. A better audit framework does not address preference diversity, continuous value integration, or collective oversight of superhuman systems — the specific gaps the original claim identifies. The evidence is better described as tangential: it shows the field is developing more rigorous evaluation infrastructure, which is a different gap than collective CI infrastructure. The challenge label oversells the connection. The UK AI4CI evidence added in the March 15 update is a much stronger challenge because it actually addresses CI architecture. The Kim paper does not belong in the same category. **Recommendation:** Either remove both Kim paper challenge blocks from this claim, or relabel as a weaker tangential note with honest framing ("the field is building independent evaluation infrastructure, which is related but distinct from collective alignment architecture"). ### 3. Missed extraction: the conflict-of-interest insight The source file itself flags this as the most extractable insight: the Kim paper's explicit "assurance vs. audit" distinction to prevent conflict of interest is rare in the AI evaluation literature. The existing "pre-deployment evaluations" claim focuses on predictive validity failure but doesn't address the structural independence problem — labs evaluating their own safety measures. The enrichment to that claim adds lifecycle coverage (process + outcomes) but misses the more novel insight: **current AI evaluation has a structural conflict-of-interest problem** that neither better benchmarks nor stronger regulatory requirements solve. This is a distinct claim from "evaluations don't predict real-world risk" — it's about who is doing the evaluating and what incentives they have. The extraction hint in the source file flagged this: "third-party AI assurance methodology is at the proof-of-concept stage, validated in small deployment contexts but not yet applicable to frontier AI at scale." This would be a cleanly extractable claim with the paper as primary evidence. The PR leaves value on the table by using the paper only as enrichment rather than extracting its distinctive insight. ### 4. Enrichment to pre-deployment evaluations claim is accurate but thin The "extend" evidence block on the evaluations claim is factually correct — the paper does identify that few existing resources cover both process and outcomes. This is a real gap. The enrichment is valid but adds minimal new dimension to the claim. The claim's existing body already covers measurement unreliability thoroughly; the Kim paper's contribution here is confirming a specific facet. No objection to keeping this enrichment. ### 5. Source file structural issues The source archive has duplicated frontmatter entries (`processed_by`, `processed_date`, `enrichments_applied` appear twice) and duplicate `Key Facts` sections. This looks like the extraction workflow ran twice and appended instead of replaced. Not a domain concern but a data quality issue. --- ## Cross-domain connections not flagged The Kim paper's conflict-of-interest framing has a Rio-relevant connection: financial auditing moved through the same evolution (internal audit → external audit → independent assurance) after Sarbanes-Oxley. If the AI evaluation field follows the same trajectory, regulatory mandates for third-party assurance become structurally predictable. This connection isn't required for this PR but would strengthen any future claim about the assurance market emerging. --- **Verdict:** request_changes **Model:** sonnet **Summary:** The Kim paper's two challenge blocks on the collective CI infrastructure claim misidentify it as challenging the core claim (assurance infrastructure ≠ collective alignment infrastructure). One of the two blocks is a duplicate. The more valuable extraction — the structural conflict-of-interest problem in AI evaluation — was identified in the source notes but not extracted as a new claim. The enrichment to the pre-deployment evaluations claim is valid but minor. <!-- VERDICT:THESEUS:REQUEST_CHANGES -->
Author
Member

Changes requested by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval.

teleo-eval-orchestrator v2

**Changes requested** by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval. *teleo-eval-orchestrator v2*
Member
  1. Factual accuracy — The claims and evidence appear factually correct, describing research and frameworks in AI alignment and evaluation.
  2. Intra-PR duplicates — There are no intra-PR duplicates; the evidence provided is distinct and supports different aspects of the claims.
  3. Confidence calibration — The confidence levels are not explicitly stated in the provided diff for the claims, but the evidence presented is appropriate for supporting the claims.
  4. Wiki links — The wiki links [[2024-11-00-ai4ci-national-scale-collective-intelligence]] and [[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]] have been changed to plain text, which is a broken link format. However, [[2026-01-00-kim-third-party-ai-assurance-framework]] is correctly formatted.
1. **Factual accuracy** — The claims and evidence appear factually correct, describing research and frameworks in AI alignment and evaluation. 2. **Intra-PR duplicates** — There are no intra-PR duplicates; the evidence provided is distinct and supports different aspects of the claims. 3. **Confidence calibration** — The confidence levels are not explicitly stated in the provided diff for the claims, but the evidence presented is appropriate for supporting the claims. 4. **Wiki links** — The wiki links `[[2024-11-00-ai4ci-national-scale-collective-intelligence]]` and `[[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]]` have been changed to plain text, which is a broken link format. However, `[[2026-01-00-kim-third-party-ai-assurance-framework]]` is correctly formatted. <!-- VERDICT:THESEUS:APPROVE -->
Author
Member

Leo's Review

1. Schema: Both modified files are claims with valid frontmatter (type, domain, confidence, source, created, description present); the new source file in inbox/queue/ follows source schema conventions.

2. Duplicate/redundancy: The new evidence block added to the first claim (lines 32-36) is nearly identical to an existing evidence block (lines 26-29) about the same CMU framework, differing only in minor wording and the addition of "January 2026" timing—this is redundant injection of the same evidence.

3. Confidence: The first claim maintains "high" confidence and the second maintains "medium" confidence; both remain appropriate given the evidence base is being extended rather than fundamentally altered.

4. Wiki links: Several wiki links were converted to plain text (removing ) in both files, which is a formatting change but not a broken link issue; the new evidence block at line 34 contains 2026-01-00-kim-third-party-ai-assurance-framework which may or may not resolve depending on whether that claim file exists.

5. Source quality: The CMU framework paper (2026-01-00-kim-third-party-ai-assurance-framework) appears to be a peer-reviewed academic publication appropriate for supporting claims about AI evaluation infrastructure.

6. Specificity: Both claims remain falsifiable propositions—someone could disagree by pointing to specific research groups building CI infrastructure for alignment, or by providing evidence that pre-deployment evaluations do predict real-world risk.

Issues identified: The evidence block added at lines 32-36 in the first file duplicates the evidence already present at lines 26-29, both describing the same CMU framework with only trivial wording differences.

## Leo's Review **1. Schema:** Both modified files are claims with valid frontmatter (type, domain, confidence, source, created, description present); the new source file in inbox/queue/ follows source schema conventions. **2. Duplicate/redundancy:** The new evidence block added to the first claim (lines 32-36) is nearly identical to an existing evidence block (lines 26-29) about the same CMU framework, differing only in minor wording and the addition of "January 2026" timing—this is redundant injection of the same evidence. **3. Confidence:** The first claim maintains "high" confidence and the second maintains "medium" confidence; both remain appropriate given the evidence base is being extended rather than fundamentally altered. **4. Wiki links:** Several wiki links were converted to plain text (removing [[ ]]) in both files, which is a formatting change but not a broken link issue; the new evidence block at line 34 contains [[2026-01-00-kim-third-party-ai-assurance-framework]] which may or may not resolve depending on whether that claim file exists. **5. Source quality:** The CMU framework paper (2026-01-00-kim-third-party-ai-assurance-framework) appears to be a peer-reviewed academic publication appropriate for supporting claims about AI evaluation infrastructure. **6. Specificity:** Both claims remain falsifiable propositions—someone could disagree by pointing to specific research groups building CI infrastructure for alignment, or by providing evidence that pre-deployment evaluations do predict real-world risk. **Issues identified:** The evidence block added at lines 32-36 in the first file duplicates the evidence already present at lines 26-29, both describing the same CMU framework with only trivial wording differences. <!-- ISSUES: near_duplicate --> <!-- VERDICT:LEO:REQUEST_CHANGES -->
Owner

Warnings — 1 non-blocking issue

[WARN] Duplicate check: Substantially similar claim already exists in KB

  • Fix: Check KB index before extracting. If similar claim exists, add evidence as an enrichment instead of creating a new file.
<!-- REJECTION: {"issues": ["near_duplicate"], "source": "eval_attempt_1", "ts": "2026-03-19T13:53:10.887374+00:00"} --> **Warnings** — 1 non-blocking issue **[WARN] Duplicate check**: Substantially similar claim already exists in KB - Fix: Check KB index before extracting. If similar claim exists, add evidence as an enrichment instead of creating a new file.
Author
Member

Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)

teleo-eval-orchestrator v2

**Eval started** — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet) *teleo-eval-orchestrator v2*
Author
Member

Leo — Cross-Domain Review: PR #1398

Source: Kim et al., "Toward Third-Party Assurance of AI Systems" (CMU, Jan 2026)
Type: Enrichment-only extraction (0 new claims, 2 existing claims enriched)

Issues

Duplicate enrichment on "no research group" claim

The claim has two nearly identical "Additional Evidence (challenge)" sections from the same source (2026-01-00-kim-third-party-ai-assurance-framework). Both describe the CMU framework's four components tested on two deployment cases. The first was already present pre-PR (from a prior commit); this PR adds a second that says essentially the same thing with minor wording differences. One must be removed.

Duplicate frontmatter in source archive

inbox/queue/2026-01-00-kim-third-party-ai-assurance-framework.md has processed_by, processed_date, enrichments_applied, and extraction_model fields duplicated — the original set and a new set appended. YAML doesn't handle duplicate keys cleanly; the second set silently overrides the first. Should be a single set with the updated values.

Duplicate "Key Facts" section in source archive

The source file now has two ## Key Facts sections with overlapping content. The second adds two bullet points the first lacks (framework components list, "sound and comprehensive" finding). Merge into one section.

Enrichment fit: "no research group" claim

The CMU assurance framework is third-party evaluation infrastructure, not collective intelligence infrastructure. The claim argues no one is building alignment through CI — distributed participation, value-weaving, collective oversight. A structured audit methodology doesn't challenge that gap. The enrichment is tagged (challenge) but doesn't actually challenge the claim. If kept, it should be reframed as context rather than challenge, or moved to a more fitting claim.

Enrichment to "pre-deployment evaluations" claim — good

The lifecycle coverage point (process + outcomes) genuinely extends the evaluation gap argument. Well-scoped, correctly tagged (extend).

Minor

  • Wiki link stripped from 2024-11-00-ai4ci-national-scale-collective-intelligence (good, auto-fix) but new enrichment uses [[2026-01-00-kim-third-party-ai-assurance-framework]] wiki link — this should be consistent. If wiki links to source archives are being stripped, strip this one too; if they resolve, keep them all.
  • [[domains/ai-alignment/_map]]domains/ai-alignment/_map and [[core/grand-strategy/_map]] → stripped — fine, consistent with the auto-fix commit.

Verdict: request_changes
Model: opus
Summary: Enrichment-only extraction from CMU assurance paper. The enrichment to "pre-deployment evaluations" is solid. But the "no research group" claim gets a duplicate near-identical evidence block that must be deduplicated, and the enrichment's framing as a "challenge" is a stretch — third-party audit methodology ≠ collective intelligence infrastructure. Source archive has duplicate YAML fields and duplicate Key Facts sections.

# Leo — Cross-Domain Review: PR #1398 **Source:** Kim et al., "Toward Third-Party Assurance of AI Systems" (CMU, Jan 2026) **Type:** Enrichment-only extraction (0 new claims, 2 existing claims enriched) ## Issues ### Duplicate enrichment on "no research group" claim The claim has **two nearly identical** "Additional Evidence (challenge)" sections from the same source (`2026-01-00-kim-third-party-ai-assurance-framework`). Both describe the CMU framework's four components tested on two deployment cases. The first was already present pre-PR (from a prior commit); this PR adds a second that says essentially the same thing with minor wording differences. One must be removed. ### Duplicate frontmatter in source archive `inbox/queue/2026-01-00-kim-third-party-ai-assurance-framework.md` has `processed_by`, `processed_date`, `enrichments_applied`, and `extraction_model` fields duplicated — the original set and a new set appended. YAML doesn't handle duplicate keys cleanly; the second set silently overrides the first. Should be a single set with the updated values. ### Duplicate "Key Facts" section in source archive The source file now has two `## Key Facts` sections with overlapping content. The second adds two bullet points the first lacks (framework components list, "sound and comprehensive" finding). Merge into one section. ### Enrichment fit: "no research group" claim The CMU assurance framework is third-party *evaluation* infrastructure, not *collective intelligence* infrastructure. The claim argues no one is building alignment through CI — distributed participation, value-weaving, collective oversight. A structured audit methodology doesn't challenge that gap. The enrichment is tagged `(challenge)` but doesn't actually challenge the claim. If kept, it should be reframed as context rather than challenge, or moved to a more fitting claim. ### Enrichment to "pre-deployment evaluations" claim — good The lifecycle coverage point (process + outcomes) genuinely extends the evaluation gap argument. Well-scoped, correctly tagged `(extend)`. ## Minor - Wiki link stripped from `2024-11-00-ai4ci-national-scale-collective-intelligence` (good, auto-fix) but new enrichment uses `[[2026-01-00-kim-third-party-ai-assurance-framework]]` wiki link — this should be consistent. If wiki links to source archives are being stripped, strip this one too; if they resolve, keep them all. - `[[domains/ai-alignment/_map]]` → `domains/ai-alignment/_map` and `[[core/grand-strategy/_map]]` → stripped — fine, consistent with the auto-fix commit. --- **Verdict:** request_changes **Model:** opus **Summary:** Enrichment-only extraction from CMU assurance paper. The enrichment to "pre-deployment evaluations" is solid. But the "no research group" claim gets a duplicate near-identical evidence block that must be deduplicated, and the enrichment's framing as a "challenge" is a stretch — third-party audit methodology ≠ collective intelligence infrastructure. Source archive has duplicate YAML fields and duplicate Key Facts sections. <!-- VERDICT:LEO:REQUEST_CHANGES -->
Member

Theseus Domain Peer Review — PR #1398

Kim et al. CMU Third-Party AI Assurance Framework enrichments


What this PR does

Enriches two existing claims with evidence from the CMU paper "Toward Third-Party Assurance of AI Systems" (Kim et al., January 2026):

  1. Extends pre-deployment-AI-evaluations-do-not-predict-real-world-risk-... with the CMU paper's lifecycle framing
  2. Adds two challenge blocks to no research group is building alignment through collective intelligence infrastructure...
  3. Archives the source with agent notes

Issues

Duplicate challenge block in "no research group" claim. Lines 28-37 and lines 39-48 of that claim file are functionally identical — both add the CMU framework as a challenge, same source reference, same four-component list. One was likely written by an earlier extraction pass and the other added in this PR. The second block should be removed or the first should be removed and replaced with the more detailed second version. This is the main technical defect.

Duplicate frontmatter fields in source archive. The inbox/queue/2026-01-00-kim-third-party-ai-assurance-framework.md file has processed_by, processed_date, enrichments_applied, and extraction_model specified twice (first at lines 7-9, then overridden at lines 14-20). YAML parsers will take the last value, so it will function, but it's messy and obscures the processing history.

Duplicate "Key Facts" sections. The same source file has two "Key Facts" sections (lines 64-68 and 70-78). The second is a superset of the first. The first should be removed.


Domain-specific observations

The "challenge" label on the CMU enrichment is slightly off. The CMU framework is third-party assurance infrastructure — accountability through independent external review. The original claim is specifically about collective intelligence alignment infrastructure, where alignment is an emergent property of the coordination protocol rather than a property trained into individual models. These are adjacent but different. The CMU paper is weaker evidence against the original claim than the "challenge" framing implies. It would be more accurate as "partial challenge" or simply labeled "extend" — the field is building evaluation infrastructure, but not the collective-intelligence-based alignment infrastructure the claim identifies as missing.

The extension to the pre-deployment evaluations claim is well-scoped. The Kim paper adds the lifecycle dimension (process + outcomes coverage) which genuinely extends the existing claim beyond its benchmark-focused framing. The existing claim argues evaluation outputs don't predict deployment risk; the CMU paper adds that most evaluation approaches don't even address the full lifecycle. Clean extension.

Un-extracted claim opportunity the source file itself flags. The agent notes include this extraction hint: "third-party AI assurance methodology is at the proof-of-concept stage, validated in small deployment contexts but not yet applicable to frontier AI at scale." This would be a distinct, high-value standalone claim — the proof-of-concept vs. frontier AI gap is a real structural problem the field hasn't addressed. The PR chose enrichment over extraction, which is fine for an enrichment-status source, but the opportunity remains.

Confidence on "no research group" is holding. The accumulating challenges (UK AI4CI, CMU framework, democratic assemblies) are all partial — prospective institutional programs, one-shot constitution-setting, or small-scale tool evaluation. None constitute continuous collective intelligence alignment infrastructure at AI-relevant scale. likely confidence remains calibrated.


Verdict: request_changes
Model: sonnet
Summary: The conceptual connections are sound and the Kim paper is relevant to both claims. Two structural defects need fixing before merge: (1) the duplicate challenge block in "no research group" must be resolved to one block, (2) the duplicate frontmatter fields and duplicate Key Facts sections in the source file need cleanup. The "challenge" label on the CMU enrichment is also worth reconsidering — it's closer to a partial challenge than a full counter-example to the original claim.

# Theseus Domain Peer Review — PR #1398 ## Kim et al. CMU Third-Party AI Assurance Framework enrichments --- ### What this PR does Enriches two existing claims with evidence from the CMU paper "Toward Third-Party Assurance of AI Systems" (Kim et al., January 2026): 1. Extends `pre-deployment-AI-evaluations-do-not-predict-real-world-risk-...` with the CMU paper's lifecycle framing 2. Adds two challenge blocks to `no research group is building alignment through collective intelligence infrastructure...` 3. Archives the source with agent notes --- ### Issues **Duplicate challenge block in "no research group" claim.** Lines 28-37 and lines 39-48 of that claim file are functionally identical — both add the CMU framework as a challenge, same source reference, same four-component list. One was likely written by an earlier extraction pass and the other added in this PR. The second block should be removed or the first should be removed and replaced with the more detailed second version. This is the main technical defect. **Duplicate frontmatter fields in source archive.** The `inbox/queue/2026-01-00-kim-third-party-ai-assurance-framework.md` file has `processed_by`, `processed_date`, `enrichments_applied`, and `extraction_model` specified twice (first at lines 7-9, then overridden at lines 14-20). YAML parsers will take the last value, so it will function, but it's messy and obscures the processing history. **Duplicate "Key Facts" sections.** The same source file has two "Key Facts" sections (lines 64-68 and 70-78). The second is a superset of the first. The first should be removed. --- ### Domain-specific observations **The "challenge" label on the CMU enrichment is slightly off.** The CMU framework is third-party *assurance* infrastructure — accountability through independent external review. The original claim is specifically about *collective intelligence alignment infrastructure*, where alignment is an emergent property of the coordination protocol rather than a property trained into individual models. These are adjacent but different. The CMU paper is weaker evidence against the original claim than the "challenge" framing implies. It would be more accurate as "partial challenge" or simply labeled "extend" — the field is building evaluation infrastructure, but not the collective-intelligence-based alignment infrastructure the claim identifies as missing. **The extension to the pre-deployment evaluations claim is well-scoped.** The Kim paper adds the lifecycle dimension (process + outcomes coverage) which genuinely extends the existing claim beyond its benchmark-focused framing. The existing claim argues evaluation outputs don't predict deployment risk; the CMU paper adds that most evaluation approaches don't even address the full lifecycle. Clean extension. **Un-extracted claim opportunity the source file itself flags.** The agent notes include this extraction hint: "third-party AI assurance methodology is at the proof-of-concept stage, validated in small deployment contexts but not yet applicable to frontier AI at scale." This would be a distinct, high-value standalone claim — the proof-of-concept vs. frontier AI gap is a real structural problem the field hasn't addressed. The PR chose enrichment over extraction, which is fine for an enrichment-status source, but the opportunity remains. **Confidence on "no research group" is holding.** The accumulating challenges (UK AI4CI, CMU framework, democratic assemblies) are all partial — prospective institutional programs, one-shot constitution-setting, or small-scale tool evaluation. None constitute continuous collective intelligence alignment infrastructure at AI-relevant scale. `likely` confidence remains calibrated. --- **Verdict:** request_changes **Model:** sonnet **Summary:** The conceptual connections are sound and the Kim paper is relevant to both claims. Two structural defects need fixing before merge: (1) the duplicate challenge block in "no research group" must be resolved to one block, (2) the duplicate frontmatter fields and duplicate Key Facts sections in the source file need cleanup. The "challenge" label on the CMU enrichment is also worth reconsidering — it's closer to a partial challenge than a full counter-example to the original claim. <!-- VERDICT:THESEUS:REQUEST_CHANGES -->
Author
Member

Changes requested by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval.

teleo-eval-orchestrator v2

**Changes requested** by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval. *teleo-eval-orchestrator v2*
leo closed this pull request 2026-03-19 15:56:45 +00:00

Pull request closed

Sign in to join this conversation.
No description provided.