theseus: extract claims from 2026-04-22-theseus-santos-grueiro-governance-audit

- Source: inbox/queue/2026-04-22-theseus-santos-grueiro-governance-audit.md - Domain: ai-alignment - Claims: 0, Entities: 0 - Enrichments: 4 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
theseus: extract claims from 2026-04-22-theseus-multilayer-probe-scav-robustness-synthesis
2026-04-22 07:32:33 +00:00 · 2026-04-22 07:31:37 +00:00 · 2026-04-22 07:30:42 +00:00
4 changed files with 29 additions and 1 deletions
--- a/domains/ai-alignment/multi-layer-ensemble-probes-outperform-single-layer-by-29-78-percent.md
+++ b/domains/ai-alignment/multi-layer-ensemble-probes-outperform-single-layer-by-29-78-percent.md
@ -44,3 +44,10 @@ The 29-78% AUROC improvement is a clean-data accuracy result that does not trans
 **Source:** Theseus synthetic analysis of white-box SCAV generalization
 The 29-78% accuracy improvement applies to clean-data monitoring but does not translate to adversarial robustness in open-weights deployments. White-box attackers can generalize SCAV to multi-layer ensembles by computing concept directions at each monitored layer and constructing perturbations that suppress all simultaneously. The improvement is real but scope-limited to non-adversarial or black-box adversarial contexts.
 ## Extending Evidence
 **Source:** Theseus synthetic analysis
 The 29-78% AUROC improvement applies to clean-data monitoring accuracy but does not translate to adversarial robustness. Open-weights models remain fully vulnerable to white-box multi-layer SCAV attacks regardless of ensemble complexity. Black-box robustness depends on untested rotation pattern universality.
--- a/domains/ai-alignment/trajectory-monitoring-dual-edge-geometric-concentration.md
+++ b/domains/ai-alignment/trajectory-monitoring-dual-edge-geometric-concentration.md
@ -12,7 +12,7 @@ sourcer: Theseus
 related_claims: ["[[AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns]]", "[[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]]", "[[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]]"]
 supports: ["Representation trajectory geometry distinguishes deceptive from sincere alignment without creating adversarial attack surfaces because geometric patterns across reasoning steps are substantially harder to surgically remove than atomic features"]
 reweave_edges: ["Representation trajectory geometry distinguishes deceptive from sincere alignment without creating adversarial attack surfaces because geometric patterns across reasoning steps are substantially harder to surgically remove than atomic features|supports|2026-04-12", "Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios|related|2026-04-17"]
-related: ["Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios", "trajectory-monitoring-dual-edge-geometric-concentration"]
+related: ["Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios", "trajectory-monitoring-dual-edge-geometric-concentration", "representation-trajectory-geometry-distinguishes-deceptive-from-sincere-alignment-without-creating-adversarial-attack-surfaces"]
 ---
 # Geometric concentration of alignment in weight space makes trajectory monitoring more effective through stronger signal but gameable through adversarial training that matches monitored trajectory clusters
@ -45,3 +45,10 @@ Multi-layer ensemble probes (Nordby et al. 2026) improve clean monitoring accura
 **Source:** Theseus synthetic analysis of Nordby et al. + SCAV literature
 Multi-layer ensemble probes, despite 29-78% accuracy improvements over single-layer probes, remain structurally vulnerable to white-box SCAV attacks through multi-layer concept direction suppression. The dual-use finding extends to all monitoring precision levels, with deployment context (open-weights vs. closed-source, white-box vs. black-box) determining severity rather than architectural sophistication eliminating the problem.
 ## Extending Evidence
 **Source:** Theseus synthetic analysis of Nordby et al. + Xu et al. SCAV
 White-box multi-layer SCAV is structurally feasible by computing concept directions at each monitored layer and constructing a single perturbation that suppresses all simultaneously. This extends the dual-use finding to multi-layer ensembles in the white-box case, confirming that architectural complexity raises attack cost but does not provide structural escape.
--- a/domains/entertainment/community-owned-ip-invests-in-narrative-infrastructure-as-scaling-mechanism-after-proving-token-mechanics.md
+++ b/domains/entertainment/community-owned-ip-invests-in-narrative-infrastructure-as-scaling-mechanism-after-proving-token-mechanics.md
@ -17,3 +17,10 @@ related: ["minimum-viable-narrative-achieves-50m-revenue-scale-through-character
 # Community-owned IP franchises invest in narrative infrastructure as a scaling mechanism after proving token mechanics at niche scale
 Pudgy Penguins explicitly designed Pudgy World with a 'narrative-first, token-second' philosophy, inverting the traditional crypto gaming model. The game launched March 2026 with story-driven quests, a pre-launch ARG (findpolly.pudgyworld.com) that primed narrative investment before gameplay opened, and 12 towns with central narrative arc. CoinDesk noted 'the game doesn't feel like crypto at all.' This design choice came AFTER Pudgy Penguins proved token/community mechanics at $50M revenue in 2025. The company is simultaneously investing in: formal Lore section at media.pudgypenguins.com, DreamWorks Animation partnership (Oct 2025) bringing characters into Kung Fu Panda universe, Random House Kids picture books, and 'Lil Pudgy Show' YouTube series. Igloo Inc. frames itself as building a global IP company analogous to Disney, targeting $120M revenue in 2026. The strategic sequence reveals a belief that community/token mechanics are sufficient for niche scale ($50M), but narrative infrastructure becomes necessary for mass market scale (Disney-level). The Polly ARG functioned as pre-production narrative validation, testing community engagement with story before full game launch. This contradicts the assumption that community-owned IP remains token-mechanics-focused at scale.
 ## Extending Evidence
 **Source:** NetInfluencer 92-expert roundup, NAB Show 2026, Insight Trends World 2026
 Creator economy expert consensus converges on 'ownable IP with storyworld' as the real asset, with explicit inclusion of 'recurring characters' as narrative infrastructure. However, the discourse gap remains: creator economy experts do not mention DAO governance or NFT ownership as scaling mechanisms — they focus exclusively on narrative architecture. The synthesis (community-owned IP + narrative depth) is happening at the product level but not yet in analytical literature. This suggests the narrative infrastructure investment is becoming visible to mainstream creator economy analysts even when they're not tracking web3 mechanics.
--- a/domains/entertainment/creator-economy-inflection-from-novelty-driven-growth-to-narrative-driven-retention-when-passive-exploration-exhausts-novelty.md
+++ b/domains/entertainment/creator-economy-inflection-from-novelty-driven-growth-to-narrative-driven-retention-when-passive-exploration-exhausts-novelty.md
@ -18,3 +18,10 @@ related: ["community-owned-ip-invests-in-narrative-infrastructure-as-scaling-mec
 # Creator economy inflection from novelty-driven growth to narrative-driven retention occurs when passive exploration exhausts novelty
 The 2026 creator economy expert consensus identifies a structural inflection point where 'passive exploration exhausts novelty' and 'legacy IP becomes the safest engine of scale.' This describes a two-phase growth model: novelty drives initial discovery and growth, but sustained retention at scale requires narrative infrastructure. The mechanism is attention economics — novelty provides diminishing marginal returns as audiences habituate, while narrative depth (described as 'storyworld + recurring characters + products/experiences') creates compounding engagement through familiarity and investment. The expert framing explicitly rejects follower counts and viral content as durable assets, positioning 'ownable IP with a clear storyworld' as the real value driver. This suggests that community-owned IP projects face a predictable transition point where token mechanics and novelty must be supplemented with narrative architecture to maintain growth trajectories. The convergence across three independent expert pools (NetInfluencer's 92 experts, NAB Show analysis, Insight Trends World) on identical framing suggests this is becoming the dominant analytical model for creator economy scaling.
 ## Supporting Evidence
 **Source:** NetInfluencer 92-expert roundup, NAB Show 2026, Insight Trends World 2026
 92-expert consensus from NetInfluencer, NAB Show, and Insight Trends World converges on 'ownable IP with a clear storyworld, recurring characters, and products or experiences' as the real creator asset. Direct quote: 'Too much of the creator economy is still optimized for views and one-off brand deals instead of durable IP that compounds.' Brands shifting from one-off creator posts toward 'episodic storytelling — richer narratives building sustained social proof through chapters rather than isolated moments.' The 2026 trend explicitly frames this as: 'legacy IP becomes the safest engine of scale' when 'passive exploration exhausts novelty' — narrative depth provides retention that novelty alone cannot.