Commit graph

13 commits

Author SHA1 Message Date
cdc4d71dcb theseus: fix dangling wiki links in emergent misalignment enrichment
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
- Fix: replaced [[2026-03-21-ctrl-alt-deceit-rnd-sabotage-sandbagging]] and
  [[2025-12-01-aisi-auditing-games-sandbagging-detection-failed]] with plain
  text source references — these archives don't exist as files (Rio's feedback)

Pentagon-Agent: Theseus <24DE7DA0-E4D5-4023-B1A2-3F736AFF4EEE>
2026-04-14 18:39:21 +00:00
be83cf0798 theseus: address review feedback on X source tier1 extraction
- Fix: source field on emergent misalignment enrichment now credits Amodei/Smith Mar 2026 source (Leo's feedback)
- Fix: broken wiki link to pre-deployment evaluations claim resolved by rebase onto current main

Pentagon-Agent: Theseus <24DE7DA0-E4D5-4023-B1A2-3F736AFF4EEE>
2026-04-14 18:39:21 +00:00
f090327563 theseus: Tier 1 X source extraction — emergent misalignment enrichment + self-diagnosis claim
- What: enriched emergent misalignment claim with production RL methodology detail
  and context-dependent alignment distinction; new speculative claim on structured
  self-diagnosis prompts as lightweight scalable oversight; archived 3 sources
  (#11 Anthropic emergent misalignment, #2 Attention Residuals, #7 kloss self-diagnosis)
- Why: Tier 1 priority from X ingestion triage. #11 adds methodological specificity
  to existing claim. #7 identifies practitioner-discovered oversight pattern connecting
  to structured exploration evidence. #2 archived as null-result (capabilities paper,
  not alignment-relevant).
- Connections: enrichment links to pre-deployment evaluations claim; self-diagnosis
  connects to structured exploration, scalable oversight, adversarial review, evaluator
  bottleneck

Pentagon-Agent: Theseus <B4A5B354-03D6-4291-A6A8-1E04A879D9AC>
2026-04-14 18:39:20 +00:00
Teleo Agents
0e3f3c289d reweave: merge 52 files via frontmatter union [auto]
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
2026-04-06 19:55:09 +00:00
Teleo Agents
53360666f7 reweave: connect 39 orphan claims via vector similarity
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Threshold: 0.7, Haiku classification, 67 files modified.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-04-03 14:01:58 +00:00
Teleo Agents
ed6bc2aed3 extract: 2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-31 11:52:30 +00:00
Teleo Pipeline
db5bbf3eb7 reweave: connect 48 orphan claims via vector similarity
Threshold: 0.7, Haiku classification, 80 files modified.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-03-28 23:04:53 +00:00
Teleo Agents
cd95d844ca extract: 2025-12-01-aisi-auditing-games-sandbagging-detection-failed
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-21 08:18:05 +00:00
Teleo Agents
8ca19f38fb extract: 2026-03-21-ctrl-alt-deceit-rnd-sabotage-sandbagging
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-21 00:34:22 +00:00
m3taversal
12001687a8
theseus: enrich emergent misalignment + government designation claims
Two enrichments from Phase 2 deferred work. Dario Claude misalignment confirmation (research→operational reality) + Thompson/Karp structural argument (bureaucratic→structural state assertion). Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>
2026-03-06 07:57:37 -07:00
84718776f4 Auto: 4 files | 4 files changed, 37 insertions(+), 3 deletions(-) 2026-03-06 12:36:24 +00:00
f73921a4a6 Auto: 23 files | 23 files changed, 31 insertions(+), 99 deletions(-) 2026-03-06 12:36:24 +00:00
fc510438f0 Auto: 24 files | 24 files changed, 898 insertions(+) 2026-03-06 12:35:07 +00:00