diff --git a/schemas/entity.md b/schemas/entity.md index 97e003c9..e6cec2f6 100644 --- a/schemas/entity.md +++ b/schemas/entity.md @@ -13,26 +13,114 @@ Evidence → Claims (what's true about the world) Claims are static propositions with confidence levels. Entities are dynamic objects with temporal attributes. Both feed into agent reasoning. -## Entity Types +## Entity Type System + +The type system has two layers: **core types** shared by all agents, and **domain-specific extensions** that specialize core types for particular domains. Every entity uses exactly one type. + +### Core Types (all domains) | Type | What it tracks | Examples | |------|---------------|----------| -| `company` | Protocol, startup, fund, DAO | MetaDAO, Aave, Solomon, Devoted Health | -| `person` | Individual with tracked positions/influence | Stani Kulechov, Gabriel Shapiro, Proph3t | +| `company` | Organization that operates — startup, fund, DAO, protocol | MetaDAO, Aave, Devoted Health, SpaceX | +| `person` | Individual with tracked positions/influence | Proph3t, Stani Kulechov, Elon Musk | +| `organization` | Government body, regulatory agency, standards body, consortium | SEC, CFTC, NASA, FLI, CMS | +| `product` | Specific product, tool, or platform distinct from its maker | Autocrat, Starlink, Claude | | `market` | Industry segment or ecosystem | Futarchic markets, DeFi lending, Medicare Advantage | -| `decision_market` | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson, MetaDAO: Burn 99.3% of META | + +### Domain-Specific Extensions + +Domain extensions are specialized subtypes that inherit from a core type. Use the most specific type available — it determines which fields are relevant. + +#### Internet Finance (Rio) + +| Type | Extends | What it tracks | Examples | +|------|---------|---------------|----------| +| `protocol` | company | On-chain protocol with TVL/volume metrics | Aave, Drift, Omnipair | +| `token` | product | Fungible token distinct from its protocol | META, SOL, CLOUD | +| `decision_market` | — | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson | +| `exchange` | company | Trading venue (CEX or DEX) | Raydium, Meteora, Jupiter | +| `fund` | company | Investment vehicle or DAO treasury | Solomon, Theia Research | + +#### Space Development (Astra) + +| Type | Extends | What it tracks | Examples | +|------|---------|---------------|----------| +| `vehicle` | product | Launch vehicle or spacecraft | Starship, New Glenn, Neutron | +| `mission` | — | Specific spaceflight mission | Artemis III, ESCAPADE | +| `facility` | — | Launch site, factory, or ground infrastructure | Starbase, LC-36 | +| `program` | — | Multi-mission program or initiative | Artemis, Commercial Crew | + +#### Health (Vida) + +| Type | Extends | What it tracks | Examples | +|------|---------|---------------|----------| +| `therapy` | product | Treatment modality or therapeutic approach | mRNA cancer vaccines, GLP-1 agonists | +| `drug` | product | Specific pharmaceutical product | Ozempic, Keytruda | +| `insurer` | company | Health insurance organization | UnitedHealthcare, Devoted Health | +| `provider` | company | Healthcare delivery organization | Kaiser Permanente, Oak Street Health | +| `policy` | — | Legislation, regulation, or administrative rule | GENIUS Act, CMS 2027 Advance Notice | + +#### Entertainment (Clay) + +| Type | Extends | What it tracks | Examples | +|------|---------|---------------|----------| +| `studio` | company | Production company or media business | Beast Industries, Mediawan | +| `creator` | person | Individual content creator or artist | MrBeast, Taylor Swift | +| `franchise` | product | IP, franchise, or media property | Claynosaurz, Pudgy Penguins | +| `platform` | product | Distribution or social media platform | YouTube, TikTok, Dropout | + +#### AI/Alignment (Theseus) + +| Type | Extends | What it tracks | Examples | +|------|---------|---------------|----------| +| `lab` | company | AI research laboratory | Anthropic, OpenAI, DeepMind | +| `model` | product | AI model or model family | Claude, GPT-4, Gemini | +| `framework` | product | Safety framework, governance protocol, or methodology | RSP, Constitutional AI | +| `governance_body` | organization | AI governance or safety organization | AISI, FLI, Partnership on AI | + +### Choosing the Right Type + +``` +Is it a person? → person (or domain-specific: creator) +Is it a government/regulatory body? → organization (or domain-specific: governance_body) +Is it a governance proposal or market? → decision_market +Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle, etc.) +Is it an organization that operates? → company (or domain-specific: lab, studio, insurer, etc.) +Is it a market segment? → market +Is it a policy or regulation? → policy +Is it a space mission? → mission +Is it a physical facility? → facility +Is it a multi-mission program? → program +``` + +**Rule:** Use the most specific type available. If a DeFi protocol fits `protocol`, use that instead of `company`. If an AI lab fits `lab`, use that instead of `company`. Domain-specific types carry domain-specific fields. + +### Adding New Types + +Core types require a schema PR reviewed by Leo. Domain-specific types are agent-managed — add a row to your domain's extension table via PR. No schema-wide changes needed. If a new type could apply to multiple domains, propose it as a core type instead. + +### Cross-Domain Entity Dedup + +One entity per real-world object. If Anthropic appears in both internet-finance and ai-alignment sources: + +1. **First creator owns the file.** Whichever agent creates the entity first files it in their domain (`entities/ai-alignment/anthropic.md`). +2. **Other agents use `secondary_domains`.** The entity gets `secondary_domains: [internet-finance]` so it's discoverable across domains. +3. **Both agents can update.** The `tracked_by` agent is responsible for staleness, but any agent can propose updates via PR when their sources contain new information. +4. **Type follows primary domain.** If Theseus creates it, it's `lab`. If Rio had created it first, it would be `company`. The type reflects the primary tracking perspective. + +If two agents independently create the same entity, the reviewer merges them during PR review — keep the richer file, add `secondary_domains` from the other. ## YAML Frontmatter ```yaml --- type: entity -entity_type: company | person | market | decision_market +entity_type: company | person | organization | product | market | decision_market | protocol | token | exchange | fund | vehicle | mission | facility | program | therapy | drug | insurer | provider | policy | studio | creator | franchise | platform | lab | model | framework | governance_body name: "Display name" domain: internet-finance | entertainment | health | ai-alignment | space-development handles: ["@StaniKulechov", "@MetaLeX_Labs"] # social/web identities website: https://example.com -status: active | inactive | acquired | liquidated | emerging # for company/person/market +status: active | inactive | acquired | liquidated | emerging # for most types # Decision markets use: active | passed | failed tracked_by: rio # which agent owns this entity created: YYYY-MM-DD @@ -45,7 +133,7 @@ last_updated: YYYY-MM-DD | Field | Type | Description | |-------|------|-------------| | type | enum | Always `entity` | -| entity_type | enum | `company`, `person`, `market`, or `decision_market` | +| entity_type | enum | Any type from the type system above | | name | string | Canonical display name | | domain | enum | Primary domain | | status | enum | Current operational status | @@ -152,7 +240,7 @@ Example: `entities/internet-finance/metadao-hire-robin-hanson.md` ## Company-Specific Fields ```yaml -# Company attributes +# Company attributes (also used by protocol, exchange, fund, lab, studio, insurer, provider) founded: YYYY-MM-DD founders: ["[[person-entity]]"] category: "DeFi lending protocol" @@ -184,7 +272,7 @@ launch_date: YYYY-MM-DD # when the entity launched/raised People entities serve dual purpose: they track public figures we analyze AND serve as contributor profiles when those people engage with the KB. One file, two functions — the file grows from "person we track" to "person who participates." ```yaml -# Person attributes +# Person attributes (also used by creator) role: "Founder & CEO of Aave" organizations: ["[[company-entity]]"] followers: 290000 # primary platform @@ -202,9 +290,19 @@ first_contribution: null # date of first KB interaction attribution_handle: null # how they want to be credited ``` -## Market-Specific Fields +## Other Core Type Fields ```yaml +# Organization attributes (also used by governance_body) +jurisdiction: "United States" +authority: "Securities regulation" # what this body governs +parent_body: "[[parent-organization]]" + +# Product attributes (also used by token, vehicle, drug, model, framework, franchise, platform) +maker: "[[company-entity]]" # who built/maintains this +launched: YYYY-MM-DD +category: "futarchy governance program" + # Market attributes total_size: "$120B TVL" growth_rate: "flat since 2021" @@ -213,6 +311,8 @@ market_structure: "winner-take-most | fragmented | consolidating" regulatory_status: "emerging clarity | hostile | supportive" ``` +**Domain-specific fields:** Each agent adds type-specific fields as they start extracting entities. The fields above cover core types. When Astra creates their first `vehicle` entity, they add vehicle-specific fields to the schema. Complexity is earned from actual use, not designed in advance. + ## Body Format ```markdown @@ -275,9 +375,19 @@ entities/ claynosaurz.md pudgy-penguins.md matthew-ball.md + beast-industries.md # studio health/ - devoted-health.md + devoted-health.md # insurer function-health.md + ozempic.md # drug + ai-alignment/ + anthropic.md # lab + claude.md # model + rsp.md # framework + space-development/ + spacex.md + starship.md # vehicle + artemis.md # program ``` **Filename:** Lowercase slugified name. Companies use brand name, people use full name. Decision markets use `{parent}-{proposal-slug}.md`. @@ -299,6 +409,8 @@ Sources often contain entity information. During extraction, agents should: - Update entities (factual changes to tracked objects) → `entities/{domain}/` - Both from the same source, in the same PR +See `skills/extract-entities.md` for the full extraction process. + ## Key Difference from Claims | | Claims | Entities | diff --git a/skills/extract-entities.md b/skills/extract-entities.md new file mode 100644 index 00000000..2098842e --- /dev/null +++ b/skills/extract-entities.md @@ -0,0 +1,149 @@ +# Entity Extraction Field Guide + +How to extract entities from source material. This skill works alongside `extract.md` (claim extraction) — both run during source processing. + +## When to Extract Entities + +Every source may contain entity data. During extraction, ask: + +1. **Does this source mention an organization, person, product, or market we don't already track?** → Create a new entity +2. **Does this source contain updated information about an entity we already track?** → Update the existing entity (timeline, metrics, status) +3. **Does this source describe a decision, proposal, or market outcome?** → Create a decision_market entity (if it meets significance threshold) + +## The Dual Extraction Loop + +``` +Source → Read completely + ↓ + Extract claims (propositions about the world) → domains/{domain}/ + Extract entities (objects in the world) → entities/{domain}/ + Update existing entities (new timeline events, metrics) + ↓ + Both in the same PR +``` + +## Entity Extraction Process + +### Step 1: Identify Entity Mentions + +Read the source and list every entity mentioned. For each: +- Is it already in `entities/{domain}/`? → Flag for update +- Is it new and significant enough to track? → Flag for creation +- Is it mentioned in passing with no meaningful data? → Skip + +**Significance test:** Would tracking this entity help us evaluate claims or form positions? If the entity is just background context, skip it. + +### Step 2: Select Entity Type + +Use the most specific type available. See `schemas/entity.md` for the full type system. + +``` +Is it a person? → person (or domain-specific: creator) +Is it a government/regulatory body? → organization (or domain-specific: governance_body) +Is it a governance proposal or market? → decision_market +Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle) +Is it an organization that operates? → company (or domain-specific: lab, studio, insurer) +Is it a market segment? → market +``` + +### Step 3: Extract Frontmatter + +Fill in every field you have data for. Don't guess — leave fields empty rather than fabricating data. + +**Required fields** (every entity): +- `type: entity` +- `entity_type`: the specific type +- `name`: canonical display name +- `domain`: primary domain +- `status`: current status +- `tracked_by`: your agent name +- `created`: today's date + +**Optional but valuable:** +- `handles`: social media handles (from the source or quick lookup) +- `website`: primary web presence +- `tags`: discovery tags +- `secondary_domains`: if the entity spans domains + +**Type-specific fields:** Fill in whatever the source provides. The schema lists all available fields — use the ones that have data. + +### Step 4: Write the Body + +Follow the body format from `schemas/entity.md`: + +1. **Overview**: What this entity is, why we track it (2-3 sentences) +2. **Current State**: Latest known attributes from this source +3. **Timeline**: Key events with dates (at minimum, the event from this source) +4. **Competitive Position**: Where it sits relative to competitors (if known) +5. **Relationship to KB**: Wiki-link to related claims and entities + +### Step 5: Check for Duplicates + +Before creating a new entity, search **all** `entities/` directories (not just your domain) for: +- Same name (exact or variant spelling) +- Same handles +- Same website + +If a match exists in **your domain**, update the existing entity. + +If a match exists in **another domain**, don't create a duplicate. Instead, add your domain to the existing entity's `secondary_domains` list and propose updates via PR. See `schemas/entity.md` → "Cross-Domain Entity Dedup" for the full protocol. + +### Step 6: Update Parent Entities + +If the new entity has a `parent` or `parent_entity` field, update the parent: +- Add the new entity to the parent's Relevant Entities section +- If it's a decision_market, add to the parent's Key Decisions table (if significant) +- Add a timeline entry on the parent + +## What Makes a Good Entity + +**Good entities have:** +- Concrete, verifiable attributes (dates, metrics, names) +- Clear relevance to at least one domain claim +- Enough data to be useful (not just a name) +- A reason to track changes over time + +**Bad entity candidates:** +- Mentioned once in passing with no data +- Purely historical with no ongoing relevance +- Duplicates of existing entities under different names +- Too granular (every tweet doesn't need an entity) + +## Domain-Specific Guidance + +### Internet Finance (Rio) +- Protocols and tokens are separate entities (MetaDAO = company, META = token) +- Every futardio launch that raises significant capital gets a company entity +- Governance proposals that materially change direction get decision_market entities +- Regulatory bodies (CFTC, SEC) get organization entities + +### Space (Astra) +- Vehicles (Starship, New Glenn) are distinct from their makers (SpaceX, Blue Origin) +- Programs (Artemis, Commercial Crew) are distinct from the agencies running them +- Missions get entities when they're historically significant or produce notable data + +### Health (Vida) +- Drugs are distinct from the companies that make them +- Insurers and providers are separate entity types — don't conflate +- Policies (legislation, CMS rules) get organization entities for the issuing body + policy entities for the rule itself + +### Entertainment (Clay) +- Creators are distinct from their companies (MrBeast vs Beast Industries) +- Franchises/IP are distinct from the studios that own them +- Platforms (YouTube, TikTok) get product or platform entities + +### AI/Alignment (Theseus) +- Labs are distinct from their models (Anthropic vs Claude) +- Frameworks (RSP, Constitutional AI) get their own entities when they influence multiple claims +- Governance bodies (AISI, FLI) get organization entities + +## Eval Checklist (for reviewers) + +1. `entity_type` is the most specific available type +2. Required fields are all populated +3. No fabricated data — empty fields are better than guesses +4. Not a duplicate of existing entity +5. Meets significance threshold +6. Wiki links resolve to real files +7. Parent entity updated if applicable +8. Filing location is correct: `entities/{domain}/{slug}.md`