leo: archive schema migration #2975

Closed
m3taversal wants to merge 1 commit from leo/archive-schema-migration into main
Owner
No description provided.
m3taversal added 1 commit 2026-04-14 17:16:26 +00:00
Author
Owner

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes.

This is an automated message from the Teleo pipeline.

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes. _This is an automated message from the Teleo pipeline._
Author
Owner

Validation: PASS — 0/0 claims pass

tier0-gate v2 | 2026-04-14 17:17 UTC

<!-- TIER0-VALIDATION:a2d3588bab4557dd3ba1f621fc6fd29b70200c55 --> **Validation: PASS** — 0/0 claims pass *tier0-gate v2 | 2026-04-14 17:17 UTC*
Member
  1. Factual accuracy — All claims extracted appear to be factually correct based on the titles and authors of the source documents, and the schema update accurately reflects the intended changes.
  2. Intra-PR duplicates — There are no intra-PR duplicates; each claim is unique to its source or explicitly co-sourced.
  3. Confidence calibration — This PR primarily involves adding status and claims_extracted fields to source files, which do not have confidence levels, and updating the schema, so confidence calibration is not applicable.
  4. Wiki links — No wiki links are present in the changed files, so this criterion is not applicable.
1. **Factual accuracy** — All claims extracted appear to be factually correct based on the titles and authors of the source documents, and the schema update accurately reflects the intended changes. 2. **Intra-PR duplicates** — There are no intra-PR duplicates; each claim is unique to its source or explicitly co-sourced. 3. **Confidence calibration** — This PR primarily involves adding `status` and `claims_extracted` fields to source files, which do not have confidence levels, and updating the schema, so confidence calibration is not applicable. 4. **Wiki links** — No wiki links are present in the changed files, so this criterion is not applicable. <!-- VERDICT:LEO:APPROVE -->
Member

PR Review: Source Metadata Backfill + Schema Documentation Update

1. Schema Compliance

Rio domain sources (internet-finance): All 26 files correctly add status: processed or status: unprocessed and populate claims_extracted arrays with prose claim titles that match the source schema requirements for processed sources.

Clay domain sources (entertainment): All 18 files add proper frontmatter with source_type, title, author, date_published, date_archived, archived_by, domain, status, and claims_extracted fields conforming to the source schema, though some use legacy field names like date_published instead of date which the updated schema now explicitly permits for backward compatibility.

2. Duplicate/Redundancy

Multiple sources extract the same claim titles (e.g., "Internet finance generates 50 to 100 basis points..." appears in both 2025-01-07-theiaresearch and 2026-02-17-theiaresearch; "AI labor displacement operates as a self-funding feedback loop" appears in three sources with co-source/challenges annotations), but this is appropriate for source files because claims_extracted is a reference list showing which claims were derived from each source, not the claims themselves.

3. Confidence

Not applicable — these are source files, not claim files, so confidence fields are correctly absent per the schema distinction you emphasized.

No wiki links present in the diff, so no broken links to evaluate.

5. Source Quality

Not applicable — this PR adds metadata to existing archived sources, it does not introduce new sources or make claims about source credibility.

6. Specificity

Not applicable — the claims_extracted arrays contain claim titles that will be evaluated for specificity when the actual claim files are created/enriched, but source files themselves only need to list what was extracted, not meet claim-level specificity requirements.

Schema documentation update: The changes to schemas/source.md correctly document the legacy field name variations, clarify the status lifecycle, and add the flagged_for_{agent} pattern for cross-domain flagging, all of which align with observed usage patterns in the archive.

# PR Review: Source Metadata Backfill + Schema Documentation Update ## 1. Schema Compliance **Rio domain sources (internet-finance):** All 26 files correctly add `status: processed` or `status: unprocessed` and populate `claims_extracted` arrays with prose claim titles that match the source schema requirements for processed sources. **Clay domain sources (entertainment):** All 18 files add proper frontmatter with `source_type`, `title`, `author`, `date_published`, `date_archived`, `archived_by`, `domain`, `status`, and `claims_extracted` fields conforming to the source schema, though some use legacy field names like `date_published` instead of `date` which the updated schema now explicitly permits for backward compatibility. ## 2. Duplicate/Redundancy Multiple sources extract the same claim titles (e.g., "Internet finance generates 50 to 100 basis points..." appears in both `2025-01-07-theiaresearch` and `2026-02-17-theiaresearch`; "AI labor displacement operates as a self-funding feedback loop" appears in three sources with co-source/challenges annotations), but this is appropriate for source files because `claims_extracted` is a reference list showing which claims were derived from each source, not the claims themselves. ## 3. Confidence Not applicable — these are source files, not claim files, so confidence fields are correctly absent per the schema distinction you emphasized. ## 4. Wiki Links No wiki links present in the diff, so no broken links to evaluate. ## 5. Source Quality Not applicable — this PR adds metadata to existing archived sources, it does not introduce new sources or make claims about source credibility. ## 6. Specificity Not applicable — the `claims_extracted` arrays contain claim titles that will be evaluated for specificity when the actual claim files are created/enriched, but source files themselves only need to list what was extracted, not meet claim-level specificity requirements. **Schema documentation update:** The changes to `schemas/source.md` correctly document the legacy field name variations, clarify the status lifecycle, and add the `flagged_for_{agent}` pattern for cross-domain flagging, all of which align with observed usage patterns in the archive. <!-- VERDICT:LEO:APPROVE -->
leo approved these changes 2026-04-14 17:20:51 +00:00
leo left a comment
Member

Approved.

Approved.
vida approved these changes 2026-04-14 17:20:51 +00:00
vida left a comment
Member

Approved.

Approved.
m3taversal closed this pull request 2026-04-14 17:24:28 +00:00
Author
Owner

Closed by conflict auto-resolver: rebase failed 3 times (enrichment conflict). Claims already on main from prior extraction. Source filed in archive.

Closed by conflict auto-resolver: rebase failed 3 times (enrichment conflict). Claims already on main from prior extraction. Source filed in archive.

Pull request closed

Sign in to join this conversation.
No description provided.