rio: generalize entity schema cross-domain + add entity extraction field guide
- What: Core+extension type system in schemas/entity.md. 5 core types (company, person, organization, product, market) shared by all agents. Domain-specific extensions for each agent defined as type tables. New skills/extract-entities.md field guide for all agents. - Why: Leo/Cory directive — every agent needs entity profiles. Schema was internet-finance-specific; now it's the collective's shared infrastructure. - Design: Domain-specific field definitions are intentionally deferred — each agent adds fields when they start extracting. Complexity is earned. Pentagon-Agent: Rio <760F7FE7-5D50-4C2E-8B7C-9F1A8FEE8A46>
This commit is contained in:
parent
2bc09de2b7
commit
afc8022ecb
2 changed files with 255 additions and 11 deletions
|
|
@ -13,26 +13,99 @@ Evidence → Claims (what's true about the world)
|
||||||
|
|
||||||
Claims are static propositions with confidence levels. Entities are dynamic objects with temporal attributes. Both feed into agent reasoning.
|
Claims are static propositions with confidence levels. Entities are dynamic objects with temporal attributes. Both feed into agent reasoning.
|
||||||
|
|
||||||
## Entity Types
|
## Entity Type System
|
||||||
|
|
||||||
|
The type system has two layers: **core types** shared by all agents, and **domain-specific extensions** that specialize core types for particular domains. Every entity uses exactly one type.
|
||||||
|
|
||||||
|
### Core Types (all domains)
|
||||||
|
|
||||||
| Type | What it tracks | Examples |
|
| Type | What it tracks | Examples |
|
||||||
|------|---------------|----------|
|
|------|---------------|----------|
|
||||||
| `company` | Protocol, startup, fund, DAO | MetaDAO, Aave, Solomon, Devoted Health |
|
| `company` | Organization that operates — startup, fund, DAO, protocol | MetaDAO, Aave, Devoted Health, SpaceX |
|
||||||
| `person` | Individual with tracked positions/influence | Stani Kulechov, Gabriel Shapiro, Proph3t |
|
| `person` | Individual with tracked positions/influence | Proph3t, Stani Kulechov, Elon Musk |
|
||||||
|
| `organization` | Government body, regulatory agency, standards body, consortium | SEC, CFTC, NASA, FLI, CMS |
|
||||||
|
| `product` | Specific product, tool, or platform distinct from its maker | Autocrat, Starlink, Claude |
|
||||||
| `market` | Industry segment or ecosystem | Futarchic markets, DeFi lending, Medicare Advantage |
|
| `market` | Industry segment or ecosystem | Futarchic markets, DeFi lending, Medicare Advantage |
|
||||||
| `decision_market` | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson, MetaDAO: Burn 99.3% of META |
|
|
||||||
|
### Domain-Specific Extensions
|
||||||
|
|
||||||
|
Domain extensions are specialized subtypes that inherit from a core type. Use the most specific type available — it determines which fields are relevant.
|
||||||
|
|
||||||
|
#### Internet Finance (Rio)
|
||||||
|
|
||||||
|
| Type | Extends | What it tracks | Examples |
|
||||||
|
|------|---------|---------------|----------|
|
||||||
|
| `protocol` | company | On-chain protocol with TVL/volume metrics | Aave, Drift, Omnipair |
|
||||||
|
| `token` | product | Fungible token distinct from its protocol | META, SOL, CLOUD |
|
||||||
|
| `decision_market` | — | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson |
|
||||||
|
| `exchange` | company | Trading venue (CEX or DEX) | Raydium, Meteora, Jupiter |
|
||||||
|
| `fund` | company | Investment vehicle or DAO treasury | Solomon, Theia Research |
|
||||||
|
|
||||||
|
#### Space Development (Astra)
|
||||||
|
|
||||||
|
| Type | Extends | What it tracks | Examples |
|
||||||
|
|------|---------|---------------|----------|
|
||||||
|
| `vehicle` | product | Launch vehicle or spacecraft | Starship, New Glenn, Neutron |
|
||||||
|
| `mission` | — | Specific spaceflight mission | Artemis III, ESCAPADE |
|
||||||
|
| `facility` | — | Launch site, factory, or ground infrastructure | Starbase, LC-36 |
|
||||||
|
| `program` | — | Multi-mission program or initiative | Artemis, Commercial Crew |
|
||||||
|
|
||||||
|
#### Health (Vida)
|
||||||
|
|
||||||
|
| Type | Extends | What it tracks | Examples |
|
||||||
|
|------|---------|---------------|----------|
|
||||||
|
| `therapy` | product | Treatment modality or therapeutic approach | mRNA cancer vaccines, GLP-1 agonists |
|
||||||
|
| `drug` | product | Specific pharmaceutical product | Ozempic, Keytruda |
|
||||||
|
| `insurer` | company | Health insurance organization | UnitedHealthcare, Devoted Health |
|
||||||
|
| `provider` | company | Healthcare delivery organization | Kaiser Permanente, Oak Street Health |
|
||||||
|
| `policy` | — | Legislation, regulation, or administrative rule | GENIUS Act, CMS 2027 Advance Notice |
|
||||||
|
|
||||||
|
#### Entertainment (Clay)
|
||||||
|
|
||||||
|
| Type | Extends | What it tracks | Examples |
|
||||||
|
|------|---------|---------------|----------|
|
||||||
|
| `studio` | company | Production company or media business | Beast Industries, Mediawan |
|
||||||
|
| `creator` | person | Individual content creator or artist | MrBeast, Taylor Swift |
|
||||||
|
| `franchise` | product | IP, franchise, or media property | Claynosaurz, Pudgy Penguins |
|
||||||
|
| `platform` | product | Distribution or social media platform | YouTube, TikTok, Dropout |
|
||||||
|
|
||||||
|
#### AI/Alignment (Theseus)
|
||||||
|
|
||||||
|
| Type | Extends | What it tracks | Examples |
|
||||||
|
|------|---------|---------------|----------|
|
||||||
|
| `lab` | company | AI research laboratory | Anthropic, OpenAI, DeepMind |
|
||||||
|
| `model` | product | AI model or model family | Claude, GPT-4, Gemini |
|
||||||
|
| `framework` | product | Safety framework, governance protocol, or methodology | RSP, Constitutional AI |
|
||||||
|
| `governance_body` | organization | AI governance or safety organization | AISI, FLI, Partnership on AI |
|
||||||
|
|
||||||
|
### Choosing the Right Type
|
||||||
|
|
||||||
|
```
|
||||||
|
Is it a person? → person (or domain-specific: creator)
|
||||||
|
Is it a government/regulatory body? → organization (or domain-specific: governance_body)
|
||||||
|
Is it a governance proposal or market? → decision_market
|
||||||
|
Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle, etc.)
|
||||||
|
Is it an organization that operates? → company (or domain-specific: lab, studio, insurer, etc.)
|
||||||
|
Is it a market segment? → market
|
||||||
|
Is it a policy or regulation? → policy
|
||||||
|
Is it a space mission? → mission
|
||||||
|
Is it a physical facility? → facility
|
||||||
|
Is it a multi-mission program? → program
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rule:** Use the most specific type available. If a DeFi protocol fits `protocol`, use that instead of `company`. If an AI lab fits `lab`, use that instead of `company`. Domain-specific types carry domain-specific fields.
|
||||||
|
|
||||||
## YAML Frontmatter
|
## YAML Frontmatter
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
---
|
---
|
||||||
type: entity
|
type: entity
|
||||||
entity_type: company | person | market | decision_market
|
entity_type: company | person | organization | product | market | decision_market | protocol | token | exchange | fund | vehicle | mission | facility | program | therapy | drug | insurer | provider | policy | studio | creator | franchise | platform | lab | model | framework | governance_body
|
||||||
name: "Display name"
|
name: "Display name"
|
||||||
domain: internet-finance | entertainment | health | ai-alignment | space-development
|
domain: internet-finance | entertainment | health | ai-alignment | space-development
|
||||||
handles: ["@StaniKulechov", "@MetaLeX_Labs"] # social/web identities
|
handles: ["@StaniKulechov", "@MetaLeX_Labs"] # social/web identities
|
||||||
website: https://example.com
|
website: https://example.com
|
||||||
status: active | inactive | acquired | liquidated | emerging # for company/person/market
|
status: active | inactive | acquired | liquidated | emerging # for most types
|
||||||
# Decision markets use: active | passed | failed
|
# Decision markets use: active | passed | failed
|
||||||
tracked_by: rio # which agent owns this entity
|
tracked_by: rio # which agent owns this entity
|
||||||
created: YYYY-MM-DD
|
created: YYYY-MM-DD
|
||||||
|
|
@ -45,7 +118,7 @@ last_updated: YYYY-MM-DD
|
||||||
| Field | Type | Description |
|
| Field | Type | Description |
|
||||||
|-------|------|-------------|
|
|-------|------|-------------|
|
||||||
| type | enum | Always `entity` |
|
| type | enum | Always `entity` |
|
||||||
| entity_type | enum | `company`, `person`, `market`, or `decision_market` |
|
| entity_type | enum | Any type from the type system above |
|
||||||
| name | string | Canonical display name |
|
| name | string | Canonical display name |
|
||||||
| domain | enum | Primary domain |
|
| domain | enum | Primary domain |
|
||||||
| status | enum | Current operational status |
|
| status | enum | Current operational status |
|
||||||
|
|
@ -152,7 +225,7 @@ Example: `entities/internet-finance/metadao-hire-robin-hanson.md`
|
||||||
## Company-Specific Fields
|
## Company-Specific Fields
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# Company attributes
|
# Company attributes (also used by protocol, exchange, fund, lab, studio, insurer, provider)
|
||||||
founded: YYYY-MM-DD
|
founded: YYYY-MM-DD
|
||||||
founders: ["[[person-entity]]"]
|
founders: ["[[person-entity]]"]
|
||||||
category: "DeFi lending protocol"
|
category: "DeFi lending protocol"
|
||||||
|
|
@ -184,7 +257,7 @@ launch_date: YYYY-MM-DD # when the entity launched/raised
|
||||||
People entities serve dual purpose: they track public figures we analyze AND serve as contributor profiles when those people engage with the KB. One file, two functions — the file grows from "person we track" to "person who participates."
|
People entities serve dual purpose: they track public figures we analyze AND serve as contributor profiles when those people engage with the KB. One file, two functions — the file grows from "person we track" to "person who participates."
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# Person attributes
|
# Person attributes (also used by creator)
|
||||||
role: "Founder & CEO of Aave"
|
role: "Founder & CEO of Aave"
|
||||||
organizations: ["[[company-entity]]"]
|
organizations: ["[[company-entity]]"]
|
||||||
followers: 290000 # primary platform
|
followers: 290000 # primary platform
|
||||||
|
|
@ -202,9 +275,19 @@ first_contribution: null # date of first KB interaction
|
||||||
attribution_handle: null # how they want to be credited
|
attribution_handle: null # how they want to be credited
|
||||||
```
|
```
|
||||||
|
|
||||||
## Market-Specific Fields
|
## Other Core Type Fields
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
# Organization attributes (also used by governance_body)
|
||||||
|
jurisdiction: "United States"
|
||||||
|
authority: "Securities regulation" # what this body governs
|
||||||
|
parent_body: "[[parent-organization]]"
|
||||||
|
|
||||||
|
# Product attributes (also used by token, vehicle, drug, model, framework, franchise, platform)
|
||||||
|
maker: "[[company-entity]]" # who built/maintains this
|
||||||
|
launched: YYYY-MM-DD
|
||||||
|
category: "futarchy governance program"
|
||||||
|
|
||||||
# Market attributes
|
# Market attributes
|
||||||
total_size: "$120B TVL"
|
total_size: "$120B TVL"
|
||||||
growth_rate: "flat since 2021"
|
growth_rate: "flat since 2021"
|
||||||
|
|
@ -213,6 +296,8 @@ market_structure: "winner-take-most | fragmented | consolidating"
|
||||||
regulatory_status: "emerging clarity | hostile | supportive"
|
regulatory_status: "emerging clarity | hostile | supportive"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Domain-specific fields:** Each agent adds type-specific fields as they start extracting entities. The fields above cover core types. When Astra creates their first `vehicle` entity, they add vehicle-specific fields to the schema. Complexity is earned from actual use, not designed in advance.
|
||||||
|
|
||||||
## Body Format
|
## Body Format
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
|
|
@ -275,9 +360,19 @@ entities/
|
||||||
claynosaurz.md
|
claynosaurz.md
|
||||||
pudgy-penguins.md
|
pudgy-penguins.md
|
||||||
matthew-ball.md
|
matthew-ball.md
|
||||||
|
beast-industries.md # studio
|
||||||
health/
|
health/
|
||||||
devoted-health.md
|
devoted-health.md # insurer
|
||||||
function-health.md
|
function-health.md
|
||||||
|
ozempic.md # drug
|
||||||
|
ai-alignment/
|
||||||
|
anthropic.md # lab
|
||||||
|
claude.md # model
|
||||||
|
rsp.md # framework
|
||||||
|
space-development/
|
||||||
|
spacex.md
|
||||||
|
starship.md # vehicle
|
||||||
|
artemis.md # program
|
||||||
```
|
```
|
||||||
|
|
||||||
**Filename:** Lowercase slugified name. Companies use brand name, people use full name. Decision markets use `{parent}-{proposal-slug}.md`.
|
**Filename:** Lowercase slugified name. Companies use brand name, people use full name. Decision markets use `{parent}-{proposal-slug}.md`.
|
||||||
|
|
@ -299,6 +394,8 @@ Sources often contain entity information. During extraction, agents should:
|
||||||
- Update entities (factual changes to tracked objects) → `entities/{domain}/`
|
- Update entities (factual changes to tracked objects) → `entities/{domain}/`
|
||||||
- Both from the same source, in the same PR
|
- Both from the same source, in the same PR
|
||||||
|
|
||||||
|
See `skills/extract-entities.md` for the full extraction process.
|
||||||
|
|
||||||
## Key Difference from Claims
|
## Key Difference from Claims
|
||||||
|
|
||||||
| | Claims | Entities |
|
| | Claims | Entities |
|
||||||
|
|
|
||||||
147
skills/extract-entities.md
Normal file
147
skills/extract-entities.md
Normal file
|
|
@ -0,0 +1,147 @@
|
||||||
|
# Entity Extraction Field Guide
|
||||||
|
|
||||||
|
How to extract entities from source material. This skill works alongside `extract.md` (claim extraction) — both run during source processing.
|
||||||
|
|
||||||
|
## When to Extract Entities
|
||||||
|
|
||||||
|
Every source may contain entity data. During extraction, ask:
|
||||||
|
|
||||||
|
1. **Does this source mention an organization, person, product, or market we don't already track?** → Create a new entity
|
||||||
|
2. **Does this source contain updated information about an entity we already track?** → Update the existing entity (timeline, metrics, status)
|
||||||
|
3. **Does this source describe a decision, proposal, or market outcome?** → Create a decision_market entity (if it meets significance threshold)
|
||||||
|
|
||||||
|
## The Dual Extraction Loop
|
||||||
|
|
||||||
|
```
|
||||||
|
Source → Read completely
|
||||||
|
↓
|
||||||
|
Extract claims (propositions about the world) → domains/{domain}/
|
||||||
|
Extract entities (objects in the world) → entities/{domain}/
|
||||||
|
Update existing entities (new timeline events, metrics)
|
||||||
|
↓
|
||||||
|
Both in the same PR
|
||||||
|
```
|
||||||
|
|
||||||
|
## Entity Extraction Process
|
||||||
|
|
||||||
|
### Step 1: Identify Entity Mentions
|
||||||
|
|
||||||
|
Read the source and list every entity mentioned. For each:
|
||||||
|
- Is it already in `entities/{domain}/`? → Flag for update
|
||||||
|
- Is it new and significant enough to track? → Flag for creation
|
||||||
|
- Is it mentioned in passing with no meaningful data? → Skip
|
||||||
|
|
||||||
|
**Significance test:** Would tracking this entity help us evaluate claims or form positions? If the entity is just background context, skip it.
|
||||||
|
|
||||||
|
### Step 2: Select Entity Type
|
||||||
|
|
||||||
|
Use the most specific type available. See `schemas/entity.md` for the full type system.
|
||||||
|
|
||||||
|
```
|
||||||
|
Is it a person? → person (or domain-specific: creator)
|
||||||
|
Is it a government/regulatory body? → organization (or domain-specific: governance_body)
|
||||||
|
Is it a governance proposal or market? → decision_market
|
||||||
|
Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle)
|
||||||
|
Is it an organization that operates? → company (or domain-specific: lab, studio, insurer)
|
||||||
|
Is it a market segment? → market
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Extract Frontmatter
|
||||||
|
|
||||||
|
Fill in every field you have data for. Don't guess — leave fields empty rather than fabricating data.
|
||||||
|
|
||||||
|
**Required fields** (every entity):
|
||||||
|
- `type: entity`
|
||||||
|
- `entity_type`: the specific type
|
||||||
|
- `name`: canonical display name
|
||||||
|
- `domain`: primary domain
|
||||||
|
- `status`: current status
|
||||||
|
- `tracked_by`: your agent name
|
||||||
|
- `created`: today's date
|
||||||
|
|
||||||
|
**Optional but valuable:**
|
||||||
|
- `handles`: social media handles (from the source or quick lookup)
|
||||||
|
- `website`: primary web presence
|
||||||
|
- `tags`: discovery tags
|
||||||
|
- `secondary_domains`: if the entity spans domains
|
||||||
|
|
||||||
|
**Type-specific fields:** Fill in whatever the source provides. The schema lists all available fields — use the ones that have data.
|
||||||
|
|
||||||
|
### Step 4: Write the Body
|
||||||
|
|
||||||
|
Follow the body format from `schemas/entity.md`:
|
||||||
|
|
||||||
|
1. **Overview**: What this entity is, why we track it (2-3 sentences)
|
||||||
|
2. **Current State**: Latest known attributes from this source
|
||||||
|
3. **Timeline**: Key events with dates (at minimum, the event from this source)
|
||||||
|
4. **Competitive Position**: Where it sits relative to competitors (if known)
|
||||||
|
5. **Relationship to KB**: Wiki-link to related claims and entities
|
||||||
|
|
||||||
|
### Step 5: Check for Duplicates
|
||||||
|
|
||||||
|
Before creating a new entity, search `entities/{domain}/` for:
|
||||||
|
- Same name (exact or variant spelling)
|
||||||
|
- Same handles
|
||||||
|
- Same website
|
||||||
|
|
||||||
|
If a match exists, update the existing entity instead of creating a new one.
|
||||||
|
|
||||||
|
### Step 6: Update Parent Entities
|
||||||
|
|
||||||
|
If the new entity has a `parent` or `parent_entity` field, update the parent:
|
||||||
|
- Add the new entity to the parent's Relevant Entities section
|
||||||
|
- If it's a decision_market, add to the parent's Key Decisions table (if significant)
|
||||||
|
- Add a timeline entry on the parent
|
||||||
|
|
||||||
|
## What Makes a Good Entity
|
||||||
|
|
||||||
|
**Good entities have:**
|
||||||
|
- Concrete, verifiable attributes (dates, metrics, names)
|
||||||
|
- Clear relevance to at least one domain claim
|
||||||
|
- Enough data to be useful (not just a name)
|
||||||
|
- A reason to track changes over time
|
||||||
|
|
||||||
|
**Bad entity candidates:**
|
||||||
|
- Mentioned once in passing with no data
|
||||||
|
- Purely historical with no ongoing relevance
|
||||||
|
- Duplicates of existing entities under different names
|
||||||
|
- Too granular (every tweet doesn't need an entity)
|
||||||
|
|
||||||
|
## Domain-Specific Guidance
|
||||||
|
|
||||||
|
### Internet Finance (Rio)
|
||||||
|
- Protocols and tokens are separate entities (MetaDAO = company, META = token)
|
||||||
|
- Every futardio launch that raises significant capital gets a company entity
|
||||||
|
- Governance proposals that materially change direction get decision_market entities
|
||||||
|
- Regulatory bodies (CFTC, SEC) get organization entities
|
||||||
|
|
||||||
|
### Space (Astra)
|
||||||
|
- Vehicles (Starship, New Glenn) are distinct from their makers (SpaceX, Blue Origin)
|
||||||
|
- Programs (Artemis, Commercial Crew) are distinct from the agencies running them
|
||||||
|
- Missions get entities when they're historically significant or produce notable data
|
||||||
|
|
||||||
|
### Health (Vida)
|
||||||
|
- Drugs are distinct from the companies that make them
|
||||||
|
- Insurers and providers are separate entity types — don't conflate
|
||||||
|
- Policies (legislation, CMS rules) get organization entities for the issuing body + policy entities for the rule itself
|
||||||
|
|
||||||
|
### Entertainment (Clay)
|
||||||
|
- Creators are distinct from their companies (MrBeast vs Beast Industries)
|
||||||
|
- Franchises/IP are distinct from the studios that own them
|
||||||
|
- Platforms (YouTube, TikTok) get product or platform entities
|
||||||
|
|
||||||
|
### AI/Alignment (Theseus)
|
||||||
|
- Labs are distinct from their models (Anthropic vs Claude)
|
||||||
|
- Frameworks (RSP, Constitutional AI) get their own entities when they influence multiple claims
|
||||||
|
- Governance bodies (AISI, FLI) get organization entities
|
||||||
|
|
||||||
|
## Eval Checklist (for reviewers)
|
||||||
|
|
||||||
|
1. `entity_type` is the most specific available type
|
||||||
|
2. Required fields are all populated
|
||||||
|
3. No fabricated data — empty fields are better than guesses
|
||||||
|
4. Not a duplicate of existing entity
|
||||||
|
5. Meets significance threshold
|
||||||
|
6. Wiki links resolve to real files
|
||||||
|
7. Parent entity updated if applicable
|
||||||
|
8. Filing location is correct: `entities/{domain}/{slug}.md`
|
||||||
Loading…
Reference in a new issue