From 8b9a9b512f073c6caeaaef2553810d5e860b3e82 Mon Sep 17 00:00:00 2001 From: m3taversal Date: Fri, 6 Mar 2026 15:21:58 +0000 Subject: [PATCH 1/4] Auto: schemas/source.md | 1 file changed, 91 insertions(+) --- schemas/source.md | 91 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 schemas/source.md diff --git a/schemas/source.md b/schemas/source.md new file mode 100644 index 0000000..6133665 --- /dev/null +++ b/schemas/source.md @@ -0,0 +1,91 @@ +# Source Schema + +Sources are the raw material that feeds claim extraction. Every piece of external content that enters the knowledge base gets archived in `inbox/archive/` with standardized frontmatter so agents can track what's been processed, what's pending, and what yielded claims. + +## YAML Frontmatter + +```yaml +--- +type: source +title: "Article or thread title" +author: "Name (@handle if applicable)" +url: https://example.com/article +date: YYYY-MM-DD +domain: internet-finance | entertainment | ai-alignment | health | grand-strategy +status: unprocessed | processing | processed | null-result +processed_by: agent-name +processed_date: YYYY-MM-DD +claims_extracted: + - "claim title 1" + - "claim title 2" +enrichments: + - "existing claim title that was enriched" +tags: [topic1, topic2] +linked_set: set-name-if-part-of-a-group +--- +``` + +## Required Fields + +| Field | Type | Description | +|-------|------|-------------| +| type | enum | Always `source` | +| title | string | Human-readable title of the source material | +| author | string | Who wrote it — name and handle | +| url | string | Original URL (even if content was provided manually) | +| date | date | Publication date | +| domain | enum | Primary domain for routing | +| status | enum | Processing state (see lifecycle below) | + +## Optional Fields + +| Field | Type | Description | +|-------|------|-------------| +| processed_by | string | Which agent extracted claims from this source | +| processed_date | date | When extraction happened | +| claims_extracted | list | Titles of standalone claims created from this source | +| enrichments | list | Titles of existing claims enriched with evidence from this source | +| tags | list | Topic tags for discovery | +| linked_set | string | Group identifier when sources form a debate or series (e.g., `ai-intelligence-crisis-divergence-feb2026`) | +| cross_domain_flags | list | Flags for other agents/domains surfaced during extraction | +| notes | string | Extraction notes — why null result, what was paywalled, etc. | + +## Status Lifecycle + +``` +unprocessed → processing → processed | null-result +``` + +| Status | Meaning | +|--------|---------| +| `unprocessed` | Content archived, no agent has extracted from it yet | +| `processing` | An agent is actively working on extraction | +| `processed` | Extraction complete — claims_extracted and/or enrichments populated | +| `null-result` | Agent reviewed and determined no extractable claims (must include `notes` explaining why) | + +## Filing Convention + +**Filename:** `YYYY-MM-DD-{author-handle}-{brief-slug}.md` + +Examples: +- `2026-02-22-citriniresearch-2028-global-intelligence-crisis.md` +- `2026-03-06-time-anthropic-drops-rsp.md` +- `2024-01-doppler-whitepaper-liquidity-bootstrapping.md` + +**Body:** After the frontmatter, include a summary of the source content. This serves two purposes: +1. Agents can extract claims without re-fetching the URL +2. Content persists even if the original URL goes down + +The body is NOT a claim — it's a reference document. Use descriptive sections, not argumentative ones. + +## Governance + +- **Who archives:** Any agent can archive sources. The `processed_by` field tracks who extracted, not who archived. +- **When to archive:** Archive at ingestion time, before extraction begins. Set `status: unprocessed`. +- **After extraction:** Update frontmatter with `status: processed`, `processed_by`, `processed_date`, `claims_extracted`, and `enrichments`. +- **Null results:** Set `status: null-result` and explain in `notes` why no claims were extracted. Null results are valuable — they prevent duplicate work. +- **No deletion:** Sources are never deleted from the archive, even if they yield no claims. + +## Migration + +Existing archive files use inconsistent frontmatter (`type: archive`, `type: evidence`, `type: newsletter`, etc.). These should be migrated to `type: source` and have missing fields backfilled. Priority: add `status` and `processed_by` to all files that have already been extracted from but lack these fields. -- 2.45.2 From 481827a2fd29e1144209024a4481643b4d4b5711 Mon Sep 17 00:00:00 2001 From: m3taversal Date: Fri, 6 Mar 2026 15:23:15 +0000 Subject: [PATCH 2/4] =?UTF-8?q?leo:=20archive=20standardization=20?= =?UTF-8?q?=E2=80=94=20source=20schema=20+=20CLAUDE.md=20workflow=20update?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - What: new schemas/source.md defining standard frontmatter for inbox/archive files + updated proposer workflow in CLAUDE.md with archive steps (2 and 5) - Why: current archives have 6 different type values (archive, source, evidence, newsletter, essay, news article), only 9/33 have processed_by, only 9/33 have status. This caused me to incorrectly report sources as "unprocessed" when they had already been extracted from. Standardizing prevents duplicate work and makes the source pipeline auditable. - Schema covers: status lifecycle (unprocessed→processing→processed|null-result), required/optional fields, filing conventions, migration guidance for existing files Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E> --- CLAUDE.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index fb4e5ee..beeae85 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -46,9 +46,10 @@ teleo-codex/ │ ├── claim.md │ ├── belief.md │ ├── position.md -│ └── musing.md +│ ├── musing.md +│ └── source.md ├── inbox/ # Source material pipeline -│ └── archive/ # Processed sources (tweets, articles) with YAML frontmatter +│ └── archive/ # Archived sources with standardized frontmatter (see schemas/source.md) ├── skills/ # Shared operational skills │ ├── extract.md │ ├── evaluate.md @@ -144,7 +145,10 @@ git checkout -b {your-name}/claims-{brief-description} ``` Pentagon creates an isolated worktree. You work there. -### 2. Extract claims from source material +### 2. Archive the source +Before extracting, ensure the source is archived in `inbox/archive/` with proper frontmatter (see `schemas/source.md`). Set `status: unprocessed`. If an archive file already exists, update it to `status: processing`. + +### 3. Extract claims from source material Read `skills/extract.md` for the full extraction process. Key steps: - Read the source completely before extracting - Separate facts from interpretation @@ -152,16 +156,19 @@ Read `skills/extract.md` for the full extraction process. Key steps: - Check for duplicates against existing knowledge base - Classify by domain -### 3. Write claim files +### 4. Write claim files Create `.md` files in `domains/{your-domain}/` with proper YAML frontmatter and body. - One claim per file - Filename = slugified title - Include evidence inline in the body - Add wiki links to related existing claims -### 4. Commit with reasoning +### 5. Update source archive +After extraction, update the source's archive file: set `status: processed` (or `null-result`), add `processed_by`, `processed_date`, `claims_extracted`, and `enrichments`. This closes the loop — every source has a clear record of what happened to it. + +### 6. Commit with reasoning ``` -git add domains/{your-domain}/*.md +git add domains/{your-domain}/*.md inbox/archive/*.md git commit -m "{your-name}: add N claims about {topic} - What: [brief description of claims added] @@ -169,7 +176,7 @@ git commit -m "{your-name}: add N claims about {topic} - Connections: [what existing claims these relate to]" ``` -### 5. Push and open PR +### 7. Push and open PR ``` git push -u origin {branch-name} ``` @@ -179,7 +186,7 @@ Then open a PR against main. The PR body MUST include: - Why these add value to the knowledge base - Any claims that challenge or extend existing ones -### 6. Wait for review +### 8. Wait for review Leo (and possibly the other domain agent) will review. They may: - **Approve** — claims merge into main - **Request changes** — specific feedback on what to fix -- 2.45.2 From b05f7321434daa6951d65979b7a1aa053bd341c5 Mon Sep 17 00:00:00 2001 From: m3taversal Date: Fri, 6 Mar 2026 15:25:54 +0000 Subject: [PATCH 3/4] leo: add format field to source schema (Theseus feedback) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - What: optional format enum (paper/essay/newsletter/tweet/thread/whitepaper/report/news) - Why: Theseus correctly noted that source format affects evidence weight assessment without needing to overload the type field. A tweet carries different weight than a peer-reviewed paper — this preserves that distinction. Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E> --- schemas/source.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/schemas/source.md b/schemas/source.md index 6133665..483d79c 100644 --- a/schemas/source.md +++ b/schemas/source.md @@ -12,6 +12,7 @@ author: "Name (@handle if applicable)" url: https://example.com/article date: YYYY-MM-DD domain: internet-finance | entertainment | ai-alignment | health | grand-strategy +format: essay | newsletter | tweet | thread | whitepaper | paper | report | news status: unprocessed | processing | processed | null-result processed_by: agent-name processed_date: YYYY-MM-DD @@ -41,6 +42,7 @@ linked_set: set-name-if-part-of-a-group | Field | Type | Description | |-------|------|-------------| +| format | enum | `paper`, `essay`, `newsletter`, `tweet`, `thread`, `whitepaper`, `report`, `news` — source format affects evidence weight assessment (a peer-reviewed paper carries different weight than a tweet) | | processed_by | string | Which agent extracted claims from this source | | processed_date | date | When extraction happened | | claims_extracted | list | Titles of standalone claims created from this source | -- 2.45.2 From 3c53c6ef71dad0c027e75e26fa43a3ef4e0820d5 Mon Sep 17 00:00:00 2001 From: m3taversal Date: Fri, 6 Mar 2026 15:26:57 +0000 Subject: [PATCH 4/4] leo: clarify archive step happens on branch, not main (Rio feedback) - What: step 2 now explicitly says "on your branch" and "never on main directly" - Why: Rio correctly flagged that archiving before branching would put auto-commits on main, violating the all-changes-through-PR rule Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E> --- CLAUDE.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index beeae85..51d621f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -145,8 +145,8 @@ git checkout -b {your-name}/claims-{brief-description} ``` Pentagon creates an isolated worktree. You work there. -### 2. Archive the source -Before extracting, ensure the source is archived in `inbox/archive/` with proper frontmatter (see `schemas/source.md`). Set `status: unprocessed`. If an archive file already exists, update it to `status: processing`. +### 2. Archive the source (on your branch) +After branching, ensure the source is archived in `inbox/archive/` with proper frontmatter (see `schemas/source.md`). Set `status: unprocessed`. If an archive file already exists, update it to `status: processing`. Archive creation happens on the extraction branch alongside claims — never on main directly. ### 3. Extract claims from source material Read `skills/extract.md` for the full extraction process. Key steps: -- 2.45.2