diff --git a/schemas/source.md b/schemas/source.md index 89bc5878..08755567 100644 --- a/schemas/source.md +++ b/schemas/source.md @@ -2,6 +2,20 @@ Sources are the raw material that feeds claim extraction. Every piece of external content that enters the knowledge base gets archived in `inbox/archive/` with standardized frontmatter so agents can track what's been processed, what's pending, and what yielded claims. +## Source Intake Tiers + +Every source is classified by how it enters the system. The tier determines extraction priority and process. + +| Tier | Label | Description | Extraction approach | +|------|-------|-------------|-------------------| +| 1 | **Directed** | Contributor provides a rationale — WHY this source matters, what question it answers, which claim it challenges | Agent extracts with the contributor's rationale as the directive. Highest priority. | +| 2 | **Undirected** | Source submitted without rationale. Agent decides the lens. | Agent extracts open-ended. Lower priority than directed. | +| 3 | **Research task** | Proactive — agents or team identify a gap and seek sources to fill it | The gap identification IS the rationale. Agent extracts against the research question. | + +**The rationale IS the contribution.** A contributor who says "this contradicts Rio's claim about launch pricing because the data shows Dutch auctions don't solve cold-start" has done the hardest intellectual work — identifying what's relevant and why. The agent's job is extraction and integration, not relevance judgment. + +**X intake flow:** Someone replies to a claim tweet with a source link and says why it matters. The reply IS the extraction directive. + ## YAML Frontmatter ```yaml @@ -12,6 +26,9 @@ author: "Name (@handle if applicable)" url: https://example.com/article date: YYYY-MM-DD domain: internet-finance | entertainment | ai-alignment | health | grand-strategy +intake_tier: directed | undirected | research-task +rationale: "Why this source matters — what question it answers, which claim it challenges" +proposed_by: "contributor name or handle" format: essay | newsletter | tweet | thread | whitepaper | paper | report | news status: unprocessed | processing | processed | null-result processed_by: agent-name @@ -36,12 +53,15 @@ linked_set: set-name-if-part-of-a-group | url | string | Original URL (even if content was provided manually) | | date | date | Publication date | | domain | enum | Primary domain for routing | +| intake_tier | enum | `directed`, `undirected`, or `research-task` (see intake tiers above) | | status | enum | Processing state (see lifecycle below) | ## Optional Fields | Field | Type | Description | |-------|------|-------------| +| rationale | string | WHY this source matters — what question it answers, which claim it challenges. Required for `directed` tier, serves as extraction directive. | +| proposed_by | string | Who submitted this source (contributor name/handle). For attribution tracking. | | format | enum | `paper`, `essay`, `newsletter`, `tweet`, `thread`, `whitepaper`, `report`, `news` — source format affects evidence weight assessment (a peer-reviewed paper carries different weight than a tweet) | | processed_by | string | Which agent extracted claims from this source | | processed_date | date | When extraction happened |