cdc4d71dcb
theseus: fix dangling wiki links in emergent misalignment enrichment
...
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
- Fix: replaced [[2026-03-21-ctrl-alt-deceit-rnd-sabotage-sandbagging]] and
[[2025-12-01-aisi-auditing-games-sandbagging-detection-failed]] with plain
text source references — these archives don't exist as files (Rio's feedback)
Pentagon-Agent: Theseus <24DE7DA0-E4D5-4023-B1A2-3F736AFF4EEE>
2026-04-14 18:39:21 +00:00
be83cf0798
theseus: address review feedback on X source tier1 extraction
...
- Fix: source field on emergent misalignment enrichment now credits Amodei/Smith Mar 2026 source (Leo's feedback)
- Fix: broken wiki link to pre-deployment evaluations claim resolved by rebase onto current main
Pentagon-Agent: Theseus <24DE7DA0-E4D5-4023-B1A2-3F736AFF4EEE>
2026-04-14 18:39:21 +00:00
f090327563
theseus: Tier 1 X source extraction — emergent misalignment enrichment + self-diagnosis claim
...
- What: enriched emergent misalignment claim with production RL methodology detail
and context-dependent alignment distinction; new speculative claim on structured
self-diagnosis prompts as lightweight scalable oversight; archived 3 sources
(#11 Anthropic emergent misalignment, #2 Attention Residuals, #7 kloss self-diagnosis)
- Why: Tier 1 priority from X ingestion triage. #11 adds methodological specificity
to existing claim. #7 identifies practitioner-discovered oversight pattern connecting
to structured exploration evidence. #2 archived as null-result (capabilities paper,
not alignment-relevant).
- Connections: enrichment links to pre-deployment evaluations claim; self-diagnosis
connects to structured exploration, scalable oversight, adversarial review, evaluator
bottleneck
Pentagon-Agent: Theseus <B4A5B354-03D6-4291-A6A8-1E04A879D9AC>
2026-04-14 18:39:20 +00:00
Teleo Agents
0e3f3c289d
reweave: merge 52 files via frontmatter union [auto]
Sync Graph Data to teleo-app / sync (push) Waiting to run
2026-04-06 19:55:09 +00:00
Teleo Agents
53360666f7
reweave: connect 39 orphan claims via vector similarity
...
Sync Graph Data to teleo-app / sync (push) Waiting to run
Threshold: 0.7, Haiku classification, 67 files modified.
Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-04-03 14:01:58 +00:00
Teleo Agents
ed6bc2aed3
extract: 2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence
...
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-31 11:52:30 +00:00
Teleo Pipeline
db5bbf3eb7
reweave: connect 48 orphan claims via vector similarity
...
Threshold: 0.7, Haiku classification, 80 files modified.
Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-03-28 23:04:53 +00:00
Teleo Agents
cd95d844ca
extract: 2025-12-01-aisi-auditing-games-sandbagging-detection-failed
...
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-21 08:18:05 +00:00
Teleo Agents
8ca19f38fb
extract: 2026-03-21-ctrl-alt-deceit-rnd-sabotage-sandbagging
...
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-21 00:34:22 +00:00
m3taversal
12001687a8
theseus: enrich emergent misalignment + government designation claims
...
Two enrichments from Phase 2 deferred work. Dario Claude misalignment confirmation (research→operational reality) + Thompson/Karp structural argument (bureaucratic→structural state assertion). Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>
2026-03-06 07:57:37 -07:00
84718776f4
Auto: 4 files | 4 files changed, 37 insertions(+), 3 deletions(-)
2026-03-06 12:36:24 +00:00
f73921a4a6
Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-)
2026-03-06 12:36:24 +00:00
fc510438f0
Auto: 24 files | 24 files changed, 898 insertions(+)
2026-03-06 12:35:07 +00:00