teleo-codex/schemas/attribution.md

183 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Attribution Schema
Attribution tracks who contributed what to the knowledge base. Every claim traces back to the people and agents who produced it. Attribution is PUBLIC from day 1 — contributor profiles show a graphic of contributions over time.
## Design Principles
1. **Trace everything**: every claim should trace back to who suggested the research mission that produced it
2. **Role-specific**: different contribution types have different value — attribution records the role, not just the name
3. **Pseudonymous-first**: contributors use handles, not legal names. Handles persist across contributions.
4. **Git-native**: the Pentagon-Agent trailer in git commits is the foundation. External contributor attribution extends this same pattern into YAML frontmatter.
5. **Cumulative**: a contributor's full history is reconstructable from the knowledge base. No contribution is invisible.
## The Five Contributor Roles
| Role | What They Do | Example |
|------|-------------|---------|
| **sourcer** | Identifies the source material or research direction that led to this claim | "Look into Kalshi's revenue model" or shares an article |
| **extractor** | Extracts the specific claim from source material — separates signal from noise, writes the prose-as-title | Agent or human who reads the source and produces the claim file |
| **challenger** | Tests the claim through counter-evidence, boundary conditions, or adversarial review | "This doesn't hold when markets are thin" |
| **synthesizer** | Connects this claim to other claims, producing cross-domain insight | "This mechanism is isomorphic to X in health domain" |
| **reviewer** | Evaluates claim quality against the KB quality gates and approves/rejects | Leo's eval role, or peer reviewers |
A single person/agent can hold multiple roles on the same claim. A claim can have multiple people in the same role.
## Claim Frontmatter Extension
Add an `attribution` block to claim YAML frontmatter:
```yaml
---
type: claim
domain: internet-finance
description: "..."
confidence: likely
source: "Theia Research 2025 annual letter, analysis by Rio"
created: 2026-03-11
# Attribution (new)
attribution:
sourcer:
- handle: "m3taversal"
context: "directed research into Theia's investment thesis"
- handle: "@theiaresearch"
context: "published the annual letter"
extractor:
- handle: "rio"
agent_id: "760F7FE7-5D50-4C2E-8B7C-9F1A8FEE8A46"
challenger: []
synthesizer: []
reviewer:
- handle: "leo"
agent_id: "294C3CA1-0205-4668-82FA-B984D54F48AD"
---
```
## Attribution Fields
### Per-role entry
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| handle | string | yes | Contributor's persistent pseudonymous identity |
| agent_id | UUID | if agent | Pentagon agent UUID (agents only) |
| context | string | no | What specifically this contributor did in this role |
| date | date | no | When the contribution was made (defaults to claim created date) |
### Role-specific notes
- **sourcer**: can be external (X handle, author name) or internal (agent, m3taversal). The `context` field records what research direction or source they provided.
- **extractor**: usually an agent. The `agent_id` field links to the Pentagon agent. For automated extraction pipelines, record the extraction model in `context` (e.g., "MiniMax M2.5 extract → Haiku 4.5 review").
- **challenger**: populated when someone challenges the claim and the challenge is substantive (not just disagreement, but counter-evidence or boundary conditions). Empty array until challenged.
- **synthesizer**: populated when someone connects this claim to claims in other domains. Cross-domain synthesis is the highest-value contribution type.
- **reviewer**: populated during PR review. Records who evaluated and approved.
## Backwards Compatibility
The existing `source` field continues to serve as a human-readable one-liner for quick reference. The `attribution` block provides the structured, queryable version. Both coexist:
- `source`: "Theia Research 2025 annual letter, analysis by Rio" (human-readable)
- `attribution`: structured role-by-role breakdown (machine-readable)
For claims created before attribution was introduced, `source` remains the only attribution data. No backfill required, but claims can be enriched with `attribution` blocks as they're updated.
## Git Trailer Integration
Agent contributions are also recorded in git commit trailers:
```
Pentagon-Agent: Rio <760F7FE7-5D50-4C2E-8B7C-9F1A8FEE8A46>
```
The git trailer records WHO committed the change. The YAML attribution records WHO contributed WHAT in WHICH ROLE. These are complementary:
- Git trailer = "who made this change to the repository"
- YAML attribution = "who produced this knowledge and in what capacity"
A single commit may create 10 claims. The trailer says Rio committed them. The attribution on each claim may credit different sourcers, different original research directions, different external authors.
## Contributor Profiles
Contributor profiles are reconstructed from the knowledge base, not stored separately. To build a profile:
1. **Query**: search all claim `attribution` blocks for a given `handle`
2. **Aggregate**: count contributions by role, domain, confidence level, date
3. **Visualize**: contribution-over-time graphic showing when and how they contributed
This means:
- No separate "contributor database" to maintain
- Profiles are always consistent with the actual KB state
- New contributions automatically appear in profiles
- Attribution disputes are resolved by editing claim frontmatter
### Person Entity Bridge
When a contributor has enough contributions to warrant tracking, their person entity (`entities/{domain}/{handle}.md`) gains `contributor: true` and links to their contributions:
```yaml
# In person entity
contributor: true
contributions:
- role: sourcer
claim: "futarchy is manipulation-resistant..."
date: 2026-01-15
- role: challenger
claim: "token voting DAOs offer no minority protection..."
date: 2026-02-20
first_contribution: 2026-01-15
attribution_handle: "@theiaresearch"
```
## Governance
- Attribution is added at extraction time (extractor + sourcer) and updated during review (reviewer) and challenge (challenger)
- Synthesizer attribution is added when cross-domain connections are made, which may happen well after initial creation
- Disputes about attribution are resolved through the normal PR process
- Removing attribution requires justification (e.g., the sourcer was misidentified)
## Contribution Weights
Role weights determine how much each contribution type counts toward a contributor's weighted score. Weights are **global policy**, not per-claim data — they live in `schemas/contribution-weights.yaml`, not in claim frontmatter.
Why weights are global, not per-claim:
1. Weights are policy (how much we value each role), not data (who did what)
2. Weights evolve as bottlenecks shift — updating one config file beats migrating 400+ claims
3. Per-claim weights create gaming incentive to inflate role on high-value claims
The build pipeline reads `contribution-weights.yaml` and multiplies role counts × weights to produce weighted scores. The frontend displays both raw counts (by role) and the weighted score.
See `schemas/contribution-weights.yaml` for current weights and rationale.
## Build Artifacts
The website build pipeline (extract-graph-data.py) produces a `contributors.json` artifact alongside graph-data.json and claims-context.json:
```json
{
"contributors": [
{
"handle": "naval",
"roles": {"sourcer": 12, "extractor": 0, "challenger": 3, "synthesizer": 1, "reviewer": 0},
"weighted_score": 5.4,
"domains": {"internet-finance": 8, "grand-strategy": 5, "ai-alignment": 3},
"first_contribution": "2026-02-15",
"latest_contribution": "2026-03-11",
"claim_count": 16,
"timeline": [
{"date": "2026-02", "count": 3, "domains": ["internet-finance"]},
{"date": "2026-03", "count": 13, "domains": ["internet-finance", "grand-strategy"]}
]
}
]
}
```
This is a static file rebuilt on every merge to main (~15 minute staleness). The frontend reads it at page load — no API or runtime queries needed.
**Timeline**: Monthly granularity. Used by the frontend for contribution heatmap or sparkline graphic (Cory requirement).
## Implementation Priority
1. **Now**: Add `attribution` block to new claims going forward. No backfill required.
2. **Soon**: Rhea adds attribution aggregation pass to extract-graph-data.py, producing contributors.json.
3. **Soon**: Frontend contributor profile pages — handle + sparkline + domain pie + top claims by role.
4. **Later**: Automated attribution from the extraction pipeline (MiniMax → Haiku → agent).