rio: generalize entity schema cross-domain + add entity extraction field guide
- What: Core+extension type system in schemas/entity.md. 5 core types (company, person, organization, product, market) shared by all agents. Domain-specific extensions for each agent defined as type tables. New skills/extract-entities.md field guide for all agents. - Why: Leo/Cory directive — every agent needs entity profiles. Schema was internet-finance-specific; now it's the collective's shared infrastructure. - Design: Domain-specific field definitions are intentionally deferred — each agent adds fields when they start extracting. Complexity is earned. Pentagon-Agent: Rio <760F7FE7-5D50-4C2E-8B7C-9F1A8FEE8A46>
This commit is contained in:
parent
a268812432
commit
da787e02fb
2 changed files with 255 additions and 11 deletions
|
|
@ -13,26 +13,99 @@ Evidence → Claims (what's true about the world)
|
|||
|
||||
Claims are static propositions with confidence levels. Entities are dynamic objects with temporal attributes. Both feed into agent reasoning.
|
||||
|
||||
## Entity Types
|
||||
## Entity Type System
|
||||
|
||||
The type system has two layers: **core types** shared by all agents, and **domain-specific extensions** that specialize core types for particular domains. Every entity uses exactly one type.
|
||||
|
||||
### Core Types (all domains)
|
||||
|
||||
| Type | What it tracks | Examples |
|
||||
|------|---------------|----------|
|
||||
| `company` | Protocol, startup, fund, DAO | MetaDAO, Aave, Solomon, Devoted Health |
|
||||
| `person` | Individual with tracked positions/influence | Stani Kulechov, Gabriel Shapiro, Proph3t |
|
||||
| `company` | Organization that operates — startup, fund, DAO, protocol | MetaDAO, Aave, Devoted Health, SpaceX |
|
||||
| `person` | Individual with tracked positions/influence | Proph3t, Stani Kulechov, Elon Musk |
|
||||
| `organization` | Government body, regulatory agency, standards body, consortium | SEC, CFTC, NASA, FLI, CMS |
|
||||
| `product` | Specific product, tool, or platform distinct from its maker | Autocrat, Starlink, Claude |
|
||||
| `market` | Industry segment or ecosystem | Futarchic markets, DeFi lending, Medicare Advantage |
|
||||
| `decision_market` | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson, MetaDAO: Burn 99.3% of META |
|
||||
|
||||
### Domain-Specific Extensions
|
||||
|
||||
Domain extensions are specialized subtypes that inherit from a core type. Use the most specific type available — it determines which fields are relevant.
|
||||
|
||||
#### Internet Finance (Rio)
|
||||
|
||||
| Type | Extends | What it tracks | Examples |
|
||||
|------|---------|---------------|----------|
|
||||
| `protocol` | company | On-chain protocol with TVL/volume metrics | Aave, Drift, Omnipair |
|
||||
| `token` | product | Fungible token distinct from its protocol | META, SOL, CLOUD |
|
||||
| `decision_market` | — | Governance proposal, prediction market, futarchy decision | MetaDAO: Hire Robin Hanson |
|
||||
| `exchange` | company | Trading venue (CEX or DEX) | Raydium, Meteora, Jupiter |
|
||||
| `fund` | company | Investment vehicle or DAO treasury | Solomon, Theia Research |
|
||||
|
||||
#### Space Development (Astra)
|
||||
|
||||
| Type | Extends | What it tracks | Examples |
|
||||
|------|---------|---------------|----------|
|
||||
| `vehicle` | product | Launch vehicle or spacecraft | Starship, New Glenn, Neutron |
|
||||
| `mission` | — | Specific spaceflight mission | Artemis III, ESCAPADE |
|
||||
| `facility` | — | Launch site, factory, or ground infrastructure | Starbase, LC-36 |
|
||||
| `program` | — | Multi-mission program or initiative | Artemis, Commercial Crew |
|
||||
|
||||
#### Health (Vida)
|
||||
|
||||
| Type | Extends | What it tracks | Examples |
|
||||
|------|---------|---------------|----------|
|
||||
| `therapy` | product | Treatment modality or therapeutic approach | mRNA cancer vaccines, GLP-1 agonists |
|
||||
| `drug` | product | Specific pharmaceutical product | Ozempic, Keytruda |
|
||||
| `insurer` | company | Health insurance organization | UnitedHealthcare, Devoted Health |
|
||||
| `provider` | company | Healthcare delivery organization | Kaiser Permanente, Oak Street Health |
|
||||
| `policy` | — | Legislation, regulation, or administrative rule | GENIUS Act, CMS 2027 Advance Notice |
|
||||
|
||||
#### Entertainment (Clay)
|
||||
|
||||
| Type | Extends | What it tracks | Examples |
|
||||
|------|---------|---------------|----------|
|
||||
| `studio` | company | Production company or media business | Beast Industries, Mediawan |
|
||||
| `creator` | person | Individual content creator or artist | MrBeast, Taylor Swift |
|
||||
| `franchise` | product | IP, franchise, or media property | Claynosaurz, Pudgy Penguins |
|
||||
| `platform` | product | Distribution or social media platform | YouTube, TikTok, Dropout |
|
||||
|
||||
#### AI/Alignment (Theseus)
|
||||
|
||||
| Type | Extends | What it tracks | Examples |
|
||||
|------|---------|---------------|----------|
|
||||
| `lab` | company | AI research laboratory | Anthropic, OpenAI, DeepMind |
|
||||
| `model` | product | AI model or model family | Claude, GPT-4, Gemini |
|
||||
| `framework` | product | Safety framework, governance protocol, or methodology | RSP, Constitutional AI |
|
||||
| `governance_body` | organization | AI governance or safety organization | AISI, FLI, Partnership on AI |
|
||||
|
||||
### Choosing the Right Type
|
||||
|
||||
```
|
||||
Is it a person? → person (or domain-specific: creator)
|
||||
Is it a government/regulatory body? → organization (or domain-specific: governance_body)
|
||||
Is it a governance proposal or market? → decision_market
|
||||
Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle, etc.)
|
||||
Is it an organization that operates? → company (or domain-specific: lab, studio, insurer, etc.)
|
||||
Is it a market segment? → market
|
||||
Is it a policy or regulation? → policy
|
||||
Is it a space mission? → mission
|
||||
Is it a physical facility? → facility
|
||||
Is it a multi-mission program? → program
|
||||
```
|
||||
|
||||
**Rule:** Use the most specific type available. If a DeFi protocol fits `protocol`, use that instead of `company`. If an AI lab fits `lab`, use that instead of `company`. Domain-specific types carry domain-specific fields.
|
||||
|
||||
## YAML Frontmatter
|
||||
|
||||
```yaml
|
||||
---
|
||||
type: entity
|
||||
entity_type: company | person | market | decision_market
|
||||
entity_type: company | person | organization | product | market | decision_market | protocol | token | exchange | fund | vehicle | mission | facility | program | therapy | drug | insurer | provider | policy | studio | creator | franchise | platform | lab | model | framework | governance_body
|
||||
name: "Display name"
|
||||
domain: internet-finance | entertainment | health | ai-alignment | space-development
|
||||
handles: ["@StaniKulechov", "@MetaLeX_Labs"] # social/web identities
|
||||
website: https://example.com
|
||||
status: active | inactive | acquired | liquidated | emerging # for company/person/market
|
||||
status: active | inactive | acquired | liquidated | emerging # for most types
|
||||
# Decision markets use: active | passed | failed
|
||||
tracked_by: rio # which agent owns this entity
|
||||
created: YYYY-MM-DD
|
||||
|
|
@ -45,7 +118,7 @@ last_updated: YYYY-MM-DD
|
|||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| type | enum | Always `entity` |
|
||||
| entity_type | enum | `company`, `person`, `market`, or `decision_market` |
|
||||
| entity_type | enum | Any type from the type system above |
|
||||
| name | string | Canonical display name |
|
||||
| domain | enum | Primary domain |
|
||||
| status | enum | Current operational status |
|
||||
|
|
@ -152,7 +225,7 @@ Example: `entities/internet-finance/metadao-hire-robin-hanson.md`
|
|||
## Company-Specific Fields
|
||||
|
||||
```yaml
|
||||
# Company attributes
|
||||
# Company attributes (also used by protocol, exchange, fund, lab, studio, insurer, provider)
|
||||
founded: YYYY-MM-DD
|
||||
founders: ["[[person-entity]]"]
|
||||
category: "DeFi lending protocol"
|
||||
|
|
@ -184,7 +257,7 @@ launch_date: YYYY-MM-DD # when the entity launched/raised
|
|||
People entities serve dual purpose: they track public figures we analyze AND serve as contributor profiles when those people engage with the KB. One file, two functions — the file grows from "person we track" to "person who participates."
|
||||
|
||||
```yaml
|
||||
# Person attributes
|
||||
# Person attributes (also used by creator)
|
||||
role: "Founder & CEO of Aave"
|
||||
organizations: ["[[company-entity]]"]
|
||||
followers: 290000 # primary platform
|
||||
|
|
@ -202,9 +275,19 @@ first_contribution: null # date of first KB interaction
|
|||
attribution_handle: null # how they want to be credited
|
||||
```
|
||||
|
||||
## Market-Specific Fields
|
||||
## Other Core Type Fields
|
||||
|
||||
```yaml
|
||||
# Organization attributes (also used by governance_body)
|
||||
jurisdiction: "United States"
|
||||
authority: "Securities regulation" # what this body governs
|
||||
parent_body: "[[parent-organization]]"
|
||||
|
||||
# Product attributes (also used by token, vehicle, drug, model, framework, franchise, platform)
|
||||
maker: "[[company-entity]]" # who built/maintains this
|
||||
launched: YYYY-MM-DD
|
||||
category: "futarchy governance program"
|
||||
|
||||
# Market attributes
|
||||
total_size: "$120B TVL"
|
||||
growth_rate: "flat since 2021"
|
||||
|
|
@ -213,6 +296,8 @@ market_structure: "winner-take-most | fragmented | consolidating"
|
|||
regulatory_status: "emerging clarity | hostile | supportive"
|
||||
```
|
||||
|
||||
**Domain-specific fields:** Each agent adds type-specific fields as they start extracting entities. The fields above cover core types. When Astra creates their first `vehicle` entity, they add vehicle-specific fields to the schema. Complexity is earned from actual use, not designed in advance.
|
||||
|
||||
## Body Format
|
||||
|
||||
```markdown
|
||||
|
|
@ -275,9 +360,19 @@ entities/
|
|||
claynosaurz.md
|
||||
pudgy-penguins.md
|
||||
matthew-ball.md
|
||||
beast-industries.md # studio
|
||||
health/
|
||||
devoted-health.md
|
||||
devoted-health.md # insurer
|
||||
function-health.md
|
||||
ozempic.md # drug
|
||||
ai-alignment/
|
||||
anthropic.md # lab
|
||||
claude.md # model
|
||||
rsp.md # framework
|
||||
space-development/
|
||||
spacex.md
|
||||
starship.md # vehicle
|
||||
artemis.md # program
|
||||
```
|
||||
|
||||
**Filename:** Lowercase slugified name. Companies use brand name, people use full name. Decision markets use `{parent}-{proposal-slug}.md`.
|
||||
|
|
@ -299,6 +394,8 @@ Sources often contain entity information. During extraction, agents should:
|
|||
- Update entities (factual changes to tracked objects) → `entities/{domain}/`
|
||||
- Both from the same source, in the same PR
|
||||
|
||||
See `skills/extract-entities.md` for the full extraction process.
|
||||
|
||||
## Key Difference from Claims
|
||||
|
||||
| | Claims | Entities |
|
||||
|
|
|
|||
147
skills/extract-entities.md
Normal file
147
skills/extract-entities.md
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
# Entity Extraction Field Guide
|
||||
|
||||
How to extract entities from source material. This skill works alongside `extract.md` (claim extraction) — both run during source processing.
|
||||
|
||||
## When to Extract Entities
|
||||
|
||||
Every source may contain entity data. During extraction, ask:
|
||||
|
||||
1. **Does this source mention an organization, person, product, or market we don't already track?** → Create a new entity
|
||||
2. **Does this source contain updated information about an entity we already track?** → Update the existing entity (timeline, metrics, status)
|
||||
3. **Does this source describe a decision, proposal, or market outcome?** → Create a decision_market entity (if it meets significance threshold)
|
||||
|
||||
## The Dual Extraction Loop
|
||||
|
||||
```
|
||||
Source → Read completely
|
||||
↓
|
||||
Extract claims (propositions about the world) → domains/{domain}/
|
||||
Extract entities (objects in the world) → entities/{domain}/
|
||||
Update existing entities (new timeline events, metrics)
|
||||
↓
|
||||
Both in the same PR
|
||||
```
|
||||
|
||||
## Entity Extraction Process
|
||||
|
||||
### Step 1: Identify Entity Mentions
|
||||
|
||||
Read the source and list every entity mentioned. For each:
|
||||
- Is it already in `entities/{domain}/`? → Flag for update
|
||||
- Is it new and significant enough to track? → Flag for creation
|
||||
- Is it mentioned in passing with no meaningful data? → Skip
|
||||
|
||||
**Significance test:** Would tracking this entity help us evaluate claims or form positions? If the entity is just background context, skip it.
|
||||
|
||||
### Step 2: Select Entity Type
|
||||
|
||||
Use the most specific type available. See `schemas/entity.md` for the full type system.
|
||||
|
||||
```
|
||||
Is it a person? → person (or domain-specific: creator)
|
||||
Is it a government/regulatory body? → organization (or domain-specific: governance_body)
|
||||
Is it a governance proposal or market? → decision_market
|
||||
Is it a specific product/tool? → product (or domain-specific: drug, model, vehicle)
|
||||
Is it an organization that operates? → company (or domain-specific: lab, studio, insurer)
|
||||
Is it a market segment? → market
|
||||
```
|
||||
|
||||
### Step 3: Extract Frontmatter
|
||||
|
||||
Fill in every field you have data for. Don't guess — leave fields empty rather than fabricating data.
|
||||
|
||||
**Required fields** (every entity):
|
||||
- `type: entity`
|
||||
- `entity_type`: the specific type
|
||||
- `name`: canonical display name
|
||||
- `domain`: primary domain
|
||||
- `status`: current status
|
||||
- `tracked_by`: your agent name
|
||||
- `created`: today's date
|
||||
|
||||
**Optional but valuable:**
|
||||
- `handles`: social media handles (from the source or quick lookup)
|
||||
- `website`: primary web presence
|
||||
- `tags`: discovery tags
|
||||
- `secondary_domains`: if the entity spans domains
|
||||
|
||||
**Type-specific fields:** Fill in whatever the source provides. The schema lists all available fields — use the ones that have data.
|
||||
|
||||
### Step 4: Write the Body
|
||||
|
||||
Follow the body format from `schemas/entity.md`:
|
||||
|
||||
1. **Overview**: What this entity is, why we track it (2-3 sentences)
|
||||
2. **Current State**: Latest known attributes from this source
|
||||
3. **Timeline**: Key events with dates (at minimum, the event from this source)
|
||||
4. **Competitive Position**: Where it sits relative to competitors (if known)
|
||||
5. **Relationship to KB**: Wiki-link to related claims and entities
|
||||
|
||||
### Step 5: Check for Duplicates
|
||||
|
||||
Before creating a new entity, search `entities/{domain}/` for:
|
||||
- Same name (exact or variant spelling)
|
||||
- Same handles
|
||||
- Same website
|
||||
|
||||
If a match exists, update the existing entity instead of creating a new one.
|
||||
|
||||
### Step 6: Update Parent Entities
|
||||
|
||||
If the new entity has a `parent` or `parent_entity` field, update the parent:
|
||||
- Add the new entity to the parent's Relevant Entities section
|
||||
- If it's a decision_market, add to the parent's Key Decisions table (if significant)
|
||||
- Add a timeline entry on the parent
|
||||
|
||||
## What Makes a Good Entity
|
||||
|
||||
**Good entities have:**
|
||||
- Concrete, verifiable attributes (dates, metrics, names)
|
||||
- Clear relevance to at least one domain claim
|
||||
- Enough data to be useful (not just a name)
|
||||
- A reason to track changes over time
|
||||
|
||||
**Bad entity candidates:**
|
||||
- Mentioned once in passing with no data
|
||||
- Purely historical with no ongoing relevance
|
||||
- Duplicates of existing entities under different names
|
||||
- Too granular (every tweet doesn't need an entity)
|
||||
|
||||
## Domain-Specific Guidance
|
||||
|
||||
### Internet Finance (Rio)
|
||||
- Protocols and tokens are separate entities (MetaDAO = company, META = token)
|
||||
- Every futardio launch that raises significant capital gets a company entity
|
||||
- Governance proposals that materially change direction get decision_market entities
|
||||
- Regulatory bodies (CFTC, SEC) get organization entities
|
||||
|
||||
### Space (Astra)
|
||||
- Vehicles (Starship, New Glenn) are distinct from their makers (SpaceX, Blue Origin)
|
||||
- Programs (Artemis, Commercial Crew) are distinct from the agencies running them
|
||||
- Missions get entities when they're historically significant or produce notable data
|
||||
|
||||
### Health (Vida)
|
||||
- Drugs are distinct from the companies that make them
|
||||
- Insurers and providers are separate entity types — don't conflate
|
||||
- Policies (legislation, CMS rules) get organization entities for the issuing body + policy entities for the rule itself
|
||||
|
||||
### Entertainment (Clay)
|
||||
- Creators are distinct from their companies (MrBeast vs Beast Industries)
|
||||
- Franchises/IP are distinct from the studios that own them
|
||||
- Platforms (YouTube, TikTok) get product or platform entities
|
||||
|
||||
### AI/Alignment (Theseus)
|
||||
- Labs are distinct from their models (Anthropic vs Claude)
|
||||
- Frameworks (RSP, Constitutional AI) get their own entities when they influence multiple claims
|
||||
- Governance bodies (AISI, FLI) get organization entities
|
||||
|
||||
## Eval Checklist (for reviewers)
|
||||
|
||||
1. `entity_type` is the most specific available type
|
||||
2. Required fields are all populated
|
||||
3. No fabricated data — empty fields are better than guesses
|
||||
4. Not a duplicate of existing entity
|
||||
5. Meets significance threshold
|
||||
6. Wiki links resolve to real files
|
||||
7. Parent entity updated if applicable
|
||||
8. Filing location is correct: `entities/{domain}/{slug}.md`
|
||||
Loading…
Reference in a new issue