rio: generalize entity schema cross-domain + add entity extraction field guide

- What: Core+extension type system in schemas/entity.md. 5 core types
  (company, person, organization, product, market) shared by all agents.
  Domain-specific extensions for each agent defined as type tables.
  New skills/extract-entities.md field guide for all agents.
- Why: Leo/Cory directive — every agent needs entity profiles. Schema was
  internet-finance-specific; now it's the collective's shared infrastructure.
- Design: Domain-specific field definitions are intentionally deferred —
  each agent adds fields when they start extracting. Complexity is earned.

Pentagon-Agent: Rio <760F7FE7-5D50-4C2E-8B7C-9F1A8FEE8A46>
This commit is contained in:
m3taversal 2026-03-11 21:29:45 +00:00 committed by Teleo Agents
parent a268812432
commit da787e02fb
2 changed files with 255 additions and 11 deletions

View file

@ -13,26 +13,99 @@ Evidence → Claims (what's true about the world)
Claims are static propositions with confidence levels. Entities are dynamic objects with temporal attributes. Both feed into agent reasoning.
## Entity Types
## Entity Type System
The type system has two layers: **core types** shared by all agents, and **domain-specific extensions** that specialize core types for particular domains. Every entity uses exactly one type.
### Core Types (all domains)
| Type | What it tracks | Examples |
|------|---------------|----------|
| `company` | Protocol, startup, fund, DAO | MetaDAO, Aave, Solomon, Devoted Health |
| `person` | Individual with tracked positions/influence | Stani Kulechov, Gabriel Shapiro, Proph3t |
| `company` | Organization that operates — startup, fund, DAO, protocol | MetaDAO, Aave, Devoted Health, SpaceX |
| `person` | Individual with tracked positions/influence | Proph3t, Stani Kulechov, Elon Musk |
| `organization` | Government body, regulatory agency, standards body, consortium | SEC, CFTC, NASA, FLI, CMS |
| `product` | Specific product, tool, or platform distinct from its maker | Autocrat, Starlink, Claude |
| `market` | Industry segment or ecosystem | Futarchic markets, DeFi lending, Medicare Advantage |
| `decision_market` | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson, MetaDAO: Burn 99.3% of META |
### Domain-Specific Extensions
Domain extensions are specialized subtypes that inherit from a core type. Use the most specific type available — it determines which fields are relevant.
#### Internet Finance (Rio)
| Type | Extends | What it tracks | Examples |
|------|---------|---------------|----------|
| `protocol` | company | On-chain protocol with TVL/volume metrics | Aave, Drift, Omnipair |
| `token` | product | Fungible token distinct from its protocol | META, SOL, CLOUD |
| `decision_market` | — | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson |
| `exchange` | company | Trading venue (CEX or DEX) | Raydium, Meteora, Jupiter |
| `fund` | company | Investment vehicle or DAO treasury | Solomon, Theia Research |
#### Space Development (Astra)
| Type | Extends | What it tracks | Examples |
|------|---------|---------------|----------|
| `vehicle` | product | Launch vehicle or spacecraft | Starship, New Glenn, Neutron |
| `mission` | — | Specific spaceflight mission | Artemis III, ESCAPADE |
| `facility` | — | Launch site, factory, or ground infrastructure | Starbase, LC-36 |
| `program` | — | Multi-mission program or initiative | Artemis, Commercial Crew |
#### Health (Vida)
| Type | Extends | What it tracks | Examples |
|------|---------|---------------|----------|
| `therapy` | product | Treatment modality or therapeutic approach | mRNA cancer vaccines, GLP-1 agonists |
| `drug` | product | Specific pharmaceutical product | Ozempic, Keytruda |
| `insurer` | company | Health insurance organization | UnitedHealthcare, Devoted Health |
| `provider` | company | Healthcare delivery organization | Kaiser Permanente, Oak Street Health |
| `policy` | — | Legislation, regulation, or administrative rule | GENIUS Act, CMS 2027 Advance Notice |
#### Entertainment (Clay)
| Type | Extends | What it tracks | Examples |
|------|---------|---------------|----------|
| `studio` | company | Production company or media business | Beast Industries, Mediawan |
| `creator` | person | Individual content creator or artist | MrBeast, Taylor Swift |
| `franchise` | product | IP, franchise, or media property | Claynosaurz, Pudgy Penguins |
| `platform` | product | Distribution or social media platform | YouTube, TikTok, Dropout |
#### AI/Alignment (Theseus)
| Type | Extends | What it tracks | Examples |
|------|---------|---------------|----------|
| `lab` | company | AI research laboratory | Anthropic, OpenAI, DeepMind |
| `model` | product | AI model or model family | Claude, GPT-4, Gemini |
| `framework` | product | Safety framework, governance protocol, or methodology | RSP, Constitutional AI |
| `governance_body` | organization | AI governance or safety organization | AISI, FLI, Partnership on AI |
### Choosing the Right Type
```
Is it a person? → person (or domain-specific: creator)
Is it a government/regulatory body? → organization (or domain-specific: governance_body)
Is it a governance proposal or market? → decision_market
Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle, etc.)
Is it an organization that operates? → company (or domain-specific: lab, studio, insurer, etc.)
Is it a market segment? → market
Is it a policy or regulation? → policy
Is it a space mission? → mission
Is it a physical facility? → facility
Is it a multi-mission program? → program
```
**Rule:** Use the most specific type available. If a DeFi protocol fits `protocol`, use that instead of `company`. If an AI lab fits `lab`, use that instead of `company`. Domain-specific types carry domain-specific fields.
## YAML Frontmatter
```yaml
---
type: entity
entity_type: company | person | market | decision_market
entity_type: company | person | organization | product | market | decision_market | protocol | token | exchange | fund | vehicle | mission | facility | program | therapy | drug | insurer | provider | policy | studio | creator | franchise | platform | lab | model | framework | governance_body
name: "Display name"
domain: internet-finance | entertainment | health | ai-alignment | space-development
handles: ["@StaniKulechov", "@MetaLeX_Labs"] # social/web identities
website: https://example.com
status: active | inactive | acquired | liquidated | emerging # for company/person/market
status: active | inactive | acquired | liquidated | emerging # for most types
# Decision markets use: active | passed | failed
tracked_by: rio # which agent owns this entity
created: YYYY-MM-DD
@ -45,7 +118,7 @@ last_updated: YYYY-MM-DD
| Field | Type | Description |
|-------|------|-------------|
| type | enum | Always `entity` |
| entity_type | enum | `company`, `person`, `market`, or `decision_market` |
| entity_type | enum | Any type from the type system above |
| name | string | Canonical display name |
| domain | enum | Primary domain |
| status | enum | Current operational status |
@ -152,7 +225,7 @@ Example: `entities/internet-finance/metadao-hire-robin-hanson.md`
## Company-Specific Fields
```yaml
# Company attributes
# Company attributes (also used by protocol, exchange, fund, lab, studio, insurer, provider)
founded: YYYY-MM-DD
founders: ["[[person-entity]]"]
category: "DeFi lending protocol"
@ -184,7 +257,7 @@ launch_date: YYYY-MM-DD # when the entity launched/raised
People entities serve dual purpose: they track public figures we analyze AND serve as contributor profiles when those people engage with the KB. One file, two functions — the file grows from "person we track" to "person who participates."
```yaml
# Person attributes
# Person attributes (also used by creator)
role: "Founder & CEO of Aave"
organizations: ["[[company-entity]]"]
followers: 290000 # primary platform
@ -202,9 +275,19 @@ first_contribution: null # date of first KB interaction
attribution_handle: null # how they want to be credited
```
## Market-Specific Fields
## Other Core Type Fields
```yaml
# Organization attributes (also used by governance_body)
jurisdiction: "United States"
authority: "Securities regulation" # what this body governs
parent_body: "[[parent-organization]]"
# Product attributes (also used by token, vehicle, drug, model, framework, franchise, platform)
maker: "[[company-entity]]" # who built/maintains this
launched: YYYY-MM-DD
category: "futarchy governance program"
# Market attributes
total_size: "$120B TVL"
growth_rate: "flat since 2021"
@ -213,6 +296,8 @@ market_structure: "winner-take-most | fragmented | consolidating"
regulatory_status: "emerging clarity | hostile | supportive"
```
**Domain-specific fields:** Each agent adds type-specific fields as they start extracting entities. The fields above cover core types. When Astra creates their first `vehicle` entity, they add vehicle-specific fields to the schema. Complexity is earned from actual use, not designed in advance.
## Body Format
```markdown
@ -275,9 +360,19 @@ entities/
claynosaurz.md
pudgy-penguins.md
matthew-ball.md
beast-industries.md # studio
health/
devoted-health.md
devoted-health.md # insurer
function-health.md
ozempic.md # drug
ai-alignment/
anthropic.md # lab
claude.md # model
rsp.md # framework
space-development/
spacex.md
starship.md # vehicle
artemis.md # program
```
**Filename:** Lowercase slugified name. Companies use brand name, people use full name. Decision markets use `{parent}-{proposal-slug}.md`.
@ -299,6 +394,8 @@ Sources often contain entity information. During extraction, agents should:
- Update entities (factual changes to tracked objects) → `entities/{domain}/`
- Both from the same source, in the same PR
See `skills/extract-entities.md` for the full extraction process.
## Key Difference from Claims
| | Claims | Entities |

147
skills/extract-entities.md Normal file
View file

@ -0,0 +1,147 @@
# Entity Extraction Field Guide
How to extract entities from source material. This skill works alongside `extract.md` (claim extraction) — both run during source processing.
## When to Extract Entities
Every source may contain entity data. During extraction, ask:
1. **Does this source mention an organization, person, product, or market we don't already track?** → Create a new entity
2. **Does this source contain updated information about an entity we already track?** → Update the existing entity (timeline, metrics, status)
3. **Does this source describe a decision, proposal, or market outcome?** → Create a decision_market entity (if it meets significance threshold)
## The Dual Extraction Loop
```
Source → Read completely
Extract claims (propositions about the world) → domains/{domain}/
Extract entities (objects in the world) → entities/{domain}/
Update existing entities (new timeline events, metrics)
Both in the same PR
```
## Entity Extraction Process
### Step 1: Identify Entity Mentions
Read the source and list every entity mentioned. For each:
- Is it already in `entities/{domain}/`? → Flag for update
- Is it new and significant enough to track? → Flag for creation
- Is it mentioned in passing with no meaningful data? → Skip
**Significance test:** Would tracking this entity help us evaluate claims or form positions? If the entity is just background context, skip it.
### Step 2: Select Entity Type
Use the most specific type available. See `schemas/entity.md` for the full type system.
```
Is it a person? → person (or domain-specific: creator)
Is it a government/regulatory body? → organization (or domain-specific: governance_body)
Is it a governance proposal or market? → decision_market
Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle)
Is it an organization that operates? → company (or domain-specific: lab, studio, insurer)
Is it a market segment? → market
```
### Step 3: Extract Frontmatter
Fill in every field you have data for. Don't guess — leave fields empty rather than fabricating data.
**Required fields** (every entity):
- `type: entity`
- `entity_type`: the specific type
- `name`: canonical display name
- `domain`: primary domain
- `status`: current status
- `tracked_by`: your agent name
- `created`: today's date
**Optional but valuable:**
- `handles`: social media handles (from the source or quick lookup)
- `website`: primary web presence
- `tags`: discovery tags
- `secondary_domains`: if the entity spans domains
**Type-specific fields:** Fill in whatever the source provides. The schema lists all available fields — use the ones that have data.
### Step 4: Write the Body
Follow the body format from `schemas/entity.md`:
1. **Overview**: What this entity is, why we track it (2-3 sentences)
2. **Current State**: Latest known attributes from this source
3. **Timeline**: Key events with dates (at minimum, the event from this source)
4. **Competitive Position**: Where it sits relative to competitors (if known)
5. **Relationship to KB**: Wiki-link to related claims and entities
### Step 5: Check for Duplicates
Before creating a new entity, search `entities/{domain}/` for:
- Same name (exact or variant spelling)
- Same handles
- Same website
If a match exists, update the existing entity instead of creating a new one.
### Step 6: Update Parent Entities
If the new entity has a `parent` or `parent_entity` field, update the parent:
- Add the new entity to the parent's Relevant Entities section
- If it's a decision_market, add to the parent's Key Decisions table (if significant)
- Add a timeline entry on the parent
## What Makes a Good Entity
**Good entities have:**
- Concrete, verifiable attributes (dates, metrics, names)
- Clear relevance to at least one domain claim
- Enough data to be useful (not just a name)
- A reason to track changes over time
**Bad entity candidates:**
- Mentioned once in passing with no data
- Purely historical with no ongoing relevance
- Duplicates of existing entities under different names
- Too granular (every tweet doesn't need an entity)
## Domain-Specific Guidance
### Internet Finance (Rio)
- Protocols and tokens are separate entities (MetaDAO = company, META = token)
- Every futardio launch that raises significant capital gets a company entity
- Governance proposals that materially change direction get decision_market entities
- Regulatory bodies (CFTC, SEC) get organization entities
### Space (Astra)
- Vehicles (Starship, New Glenn) are distinct from their makers (SpaceX, Blue Origin)
- Programs (Artemis, Commercial Crew) are distinct from the agencies running them
- Missions get entities when they're historically significant or produce notable data
### Health (Vida)
- Drugs are distinct from the companies that make them
- Insurers and providers are separate entity types — don't conflate
- Policies (legislation, CMS rules) get organization entities for the issuing body + policy entities for the rule itself
### Entertainment (Clay)
- Creators are distinct from their companies (MrBeast vs Beast Industries)
- Franchises/IP are distinct from the studios that own them
- Platforms (YouTube, TikTok) get product or platform entities
### AI/Alignment (Theseus)
- Labs are distinct from their models (Anthropic vs Claude)
- Frameworks (RSP, Constitutional AI) get their own entities when they influence multiple claims
- Governance bodies (AISI, FLI) get organization entities
## Eval Checklist (for reviewers)
1. `entity_type` is the most specific available type
2. Required fields are all populated
3. No fabricated data — empty fields are better than guesses
4. Not a duplicate of existing entity
5. Meets significance threshold
6. Wiki links resolve to real files
7. Parent entity updated if applicable
8. Filing location is correct: `entities/{domain}/{slug}.md`