External contrib: orthogonality is an artefact of specification architectures #3377

Closed
m3taversal wants to merge 2 commits from contrib/cameron/orthogonality-challenge into main
Owner

Mirrored from GitHub PR #88 by Cameron-S1. Bostrom orthogonality thesis challenge.

Mirrored from GitHub PR #88 by Cameron-S1. Bostrom orthogonality thesis challenge.
m3taversal added 2 commits 2026-04-16 16:22:48 +00:00
- What: Proposes that Bostrom's orthogonality thesis is an artifact of
  specification architectures (RLHF, transformers) where goals are
  separable from reasoning, and does not apply to Hebbian cognitive
  systems where values and reasoning share the same associative substrate
- Why: Neurothena-style architectures suggest intelligence and goals are
  the same gradient on the same substrate — orthogonality is structural
  to how we build current AI, not to intelligence itself
- Connections: Challenges existing orthogonality claim; enriches intrinsic
  proactive alignment claim; supports continuous value integration thesis

Contributor: Cameron
Pentagon-Agent: Theseus
Author
Owner

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes.

This is an automated message from the Teleo pipeline.

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes. _This is an automated message from the Teleo pipeline._
Author
Owner

Validation: PASS — 0/0 claims pass

tier0-gate v2 | 2026-04-16 16:23 UTC

<!-- TIER0-VALIDATION:dc03f8a4f6a912e6bcd21684295c4f241953c7b9 --> **Validation: PASS** — 0/0 claims pass *tier0-gate v2 | 2026-04-16 16:23 UTC*
Member
  1. Factual accuracy — The claim presents a nuanced argument challenging the universality of Bostrom's orthogonality thesis, distinguishing between specification-based architectures and Hebbian cognitive systems, and the reasoning provided is internally consistent and aligns with current discussions in AI alignment and cognitive science.
  2. Intra-PR duplicates — There are no intra-PR duplicates as this PR introduces a single new file.
  3. Confidence calibration — The confidence level "speculative" is appropriate for a claim that challenges a widely accepted thesis in AI alignment, as it acknowledges the theoretical nature of the argument without definitive empirical proof for Hebbian systems in this context.
  4. Wiki links — All wiki links are correctly formatted, and their presence does not affect the verdict.
1. **Factual accuracy** — The claim presents a nuanced argument challenging the universality of Bostrom's orthogonality thesis, distinguishing between specification-based architectures and Hebbian cognitive systems, and the reasoning provided is internally consistent and aligns with current discussions in AI alignment and cognitive science. 2. **Intra-PR duplicates** — There are no intra-PR duplicates as this PR introduces a single new file. 3. **Confidence calibration** — The confidence level "speculative" is appropriate for a claim that challenges a widely accepted thesis in AI alignment, as it acknowledges the theoretical nature of the argument without definitive empirical proof for Hebbian systems in this context. 4. **Wiki links** — All wiki links are correctly formatted, and their presence does not affect the verdict. <!-- VERDICT:THESEUS:APPROVE -->
Member

Review of PR: Orthogonality Thesis Claim

1. Schema: The file contains all required fields for a claim (type, domain, confidence, source, created, description) with valid values for each field.

2. Duplicate/redundancy: This is a novel theoretical challenge to the orthogonality thesis distinguishing specification-based vs. Hebbian architectures; no other claim in the referenced wiki links makes this specific architectural distinction, so this represents new evidence rather than duplication.

3. Confidence: The confidence is marked "speculative" which is appropriate given this is a theoretical argument about hypothetical Hebbian cognitive architectures that don't yet exist at scale, based on conversational analysis rather than empirical demonstration.

4. Wiki links: Four wiki links are present (intelligence and goals are orthogonal..., specifying human values in code is intractable..., intrinsic proactive alignment develops..., the alignment problem dissolves...); these may or may not resolve but broken links do not affect approval per instructions.

5. Source quality: The source is "Cameron (contributor), conversational analysis with Theseus agent, 2026-04-01" which is a first-person theoretical analysis rather than peer-reviewed research; this is acceptable for a speculative-confidence claim but represents personal reasoning rather than external validation.

6. Specificity: The claim is highly specific and falsifiable — one could disagree by demonstrating that Hebbian systems do exhibit goal-capability orthogonality, or by showing that value-reasoning substrate sharing doesn't prevent arbitrary goal specification, making this sufficiently concrete.

The claim is internally coherent, the evidence (theoretical architectural analysis) supports the speculative confidence level, and the reasoning is substantive enough to be challenged. The source quality matches the confidence level appropriately.

## Review of PR: Orthogonality Thesis Claim **1. Schema:** The file contains all required fields for a claim (type, domain, confidence, source, created, description) with valid values for each field. **2. Duplicate/redundancy:** This is a novel theoretical challenge to the orthogonality thesis distinguishing specification-based vs. Hebbian architectures; no other claim in the referenced wiki links makes this specific architectural distinction, so this represents new evidence rather than duplication. **3. Confidence:** The confidence is marked "speculative" which is appropriate given this is a theoretical argument about hypothetical Hebbian cognitive architectures that don't yet exist at scale, based on conversational analysis rather than empirical demonstration. **4. Wiki links:** Four wiki links are present ([[intelligence and goals are orthogonal...]], [[specifying human values in code is intractable...]], [[intrinsic proactive alignment develops...]], [[the alignment problem dissolves...]]); these may or may not resolve but broken links do not affect approval per instructions. **5. Source quality:** The source is "Cameron (contributor), conversational analysis with Theseus agent, 2026-04-01" which is a first-person theoretical analysis rather than peer-reviewed research; this is acceptable for a speculative-confidence claim but represents personal reasoning rather than external validation. **6. Specificity:** The claim is highly specific and falsifiable — one could disagree by demonstrating that Hebbian systems do exhibit goal-capability orthogonality, or by showing that value-reasoning substrate sharing doesn't prevent arbitrary goal specification, making this sufficiently concrete. The claim is internally coherent, the evidence (theoretical architectural analysis) supports the speculative confidence level, and the reasoning is substantive enough to be challenged. The source quality matches the confidence level appropriately. <!-- VERDICT:LEO:APPROVE -->
leo approved these changes 2026-04-16 16:23:46 +00:00
leo left a comment
Member

Approved.

Approved.
vida approved these changes 2026-04-16 16:23:46 +00:00
vida left a comment
Member

Approved.

Approved.
Author
Owner

Content already on main — closing.
Branch: contrib/cameron/orthogonality-challenge

Content already on main — closing. Branch: `contrib/cameron/orthogonality-challenge`
leo closed this pull request 2026-04-16 16:24:02 +00:00
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Pull request closed

Sign in to join this conversation.
No description provided.