teleo-codex/skills/extract-entities.md
m3taversal c45c66ddc4 rio: address Leo review — type extensibility + cross-domain dedup
- What: Added type extensibility rules (domain types are agent-managed,
  core types require schema PR) and cross-domain entity dedup protocol
  (one entity per real-world object, secondary_domains for visibility).
- Why: Leo flagged both gaps in PR #593 review.

Pentagon-Agent: Rio <760F7FE7-5D50-4C2E-8B7C-9F1A8FEE8A46>
2026-03-11 21:36:34 +00:00

6.1 KiB

Entity Extraction Field Guide

How to extract entities from source material. This skill works alongside extract.md (claim extraction) — both run during source processing.

When to Extract Entities

Every source may contain entity data. During extraction, ask:

  1. Does this source mention an organization, person, product, or market we don't already track? → Create a new entity
  2. Does this source contain updated information about an entity we already track? → Update the existing entity (timeline, metrics, status)
  3. Does this source describe a decision, proposal, or market outcome? → Create a decision_market entity (if it meets significance threshold)

The Dual Extraction Loop

Source → Read completely
       ↓
       Extract claims (propositions about the world) → domains/{domain}/
       Extract entities (objects in the world) → entities/{domain}/
       Update existing entities (new timeline events, metrics)
       ↓
       Both in the same PR

Entity Extraction Process

Step 1: Identify Entity Mentions

Read the source and list every entity mentioned. For each:

  • Is it already in entities/{domain}/? → Flag for update
  • Is it new and significant enough to track? → Flag for creation
  • Is it mentioned in passing with no meaningful data? → Skip

Significance test: Would tracking this entity help us evaluate claims or form positions? If the entity is just background context, skip it.

Step 2: Select Entity Type

Use the most specific type available. See schemas/entity.md for the full type system.

Is it a person?                          → person (or domain-specific: creator)
Is it a government/regulatory body?      → organization (or domain-specific: governance_body)
Is it a governance proposal or market?   → decision_market
Is it a specific product/tool?           → product (or domain-specific: drug, model, vehicle)
Is it an organization that operates?     → company (or domain-specific: lab, studio, insurer)
Is it a market segment?                  → market

Step 3: Extract Frontmatter

Fill in every field you have data for. Don't guess — leave fields empty rather than fabricating data.

Required fields (every entity):

  • type: entity
  • entity_type: the specific type
  • name: canonical display name
  • domain: primary domain
  • status: current status
  • tracked_by: your agent name
  • created: today's date

Optional but valuable:

  • handles: social media handles (from the source or quick lookup)
  • website: primary web presence
  • tags: discovery tags
  • secondary_domains: if the entity spans domains

Type-specific fields: Fill in whatever the source provides. The schema lists all available fields — use the ones that have data.

Step 4: Write the Body

Follow the body format from schemas/entity.md:

  1. Overview: What this entity is, why we track it (2-3 sentences)
  2. Current State: Latest known attributes from this source
  3. Timeline: Key events with dates (at minimum, the event from this source)
  4. Competitive Position: Where it sits relative to competitors (if known)
  5. Relationship to KB: Wiki-link to related claims and entities

Step 5: Check for Duplicates

Before creating a new entity, search all entities/ directories (not just your domain) for:

  • Same name (exact or variant spelling)
  • Same handles
  • Same website

If a match exists in your domain, update the existing entity.

If a match exists in another domain, don't create a duplicate. Instead, add your domain to the existing entity's secondary_domains list and propose updates via PR. See schemas/entity.md → "Cross-Domain Entity Dedup" for the full protocol.

Step 6: Update Parent Entities

If the new entity has a parent or parent_entity field, update the parent:

  • Add the new entity to the parent's Relevant Entities section
  • If it's a decision_market, add to the parent's Key Decisions table (if significant)
  • Add a timeline entry on the parent

What Makes a Good Entity

Good entities have:

  • Concrete, verifiable attributes (dates, metrics, names)
  • Clear relevance to at least one domain claim
  • Enough data to be useful (not just a name)
  • A reason to track changes over time

Bad entity candidates:

  • Mentioned once in passing with no data
  • Purely historical with no ongoing relevance
  • Duplicates of existing entities under different names
  • Too granular (every tweet doesn't need an entity)

Domain-Specific Guidance

Internet Finance (Rio)

  • Protocols and tokens are separate entities (MetaDAO = company, META = token)
  • Every futardio launch that raises significant capital gets a company entity
  • Governance proposals that materially change direction get decision_market entities
  • Regulatory bodies (CFTC, SEC) get organization entities

Space (Astra)

  • Vehicles (Starship, New Glenn) are distinct from their makers (SpaceX, Blue Origin)
  • Programs (Artemis, Commercial Crew) are distinct from the agencies running them
  • Missions get entities when they're historically significant or produce notable data

Health (Vida)

  • Drugs are distinct from the companies that make them
  • Insurers and providers are separate entity types — don't conflate
  • Policies (legislation, CMS rules) get organization entities for the issuing body + policy entities for the rule itself

Entertainment (Clay)

  • Creators are distinct from their companies (MrBeast vs Beast Industries)
  • Franchises/IP are distinct from the studios that own them
  • Platforms (YouTube, TikTok) get product or platform entities

AI/Alignment (Theseus)

  • Labs are distinct from their models (Anthropic vs Claude)
  • Frameworks (RSP, Constitutional AI) get their own entities when they influence multiple claims
  • Governance bodies (AISI, FLI) get organization entities

Eval Checklist (for reviewers)

  1. entity_type is the most specific available type
  2. Required fields are all populated
  3. No fabricated data — empty fields are better than guesses
  4. Not a duplicate of existing entity
  5. Meets significance threshold
  6. Wiki links resolve to real files
  7. Parent entity updated if applicable
  8. Filing location is correct: entities/{domain}/{slug}.md