teleo-infrastructure/docs/phase1b/reporting-contributor-compatibility-spec.md
2026-05-29 14:00:13 +02:00

8.8 KiB

Phase 1b Child Spec: Reporting And Contributor Compatibility

Created: 2026-05-29 Status: active draft Parent spec: docs/phase1b-agent-routing-spec.md

Product Outcome Contract

Phase 1b must not make dashboards, health checks, or contributor credit lie about review state. Reporting may stay minimal, but it must not mark a cross-domain PR as ready before all required agents have reviewed.

Goal

Update compatibility surfaces so Phase 1b required-agent reviews are represented accurately enough for operations, health, and contributor attribution without doing a dashboard redesign.

Non-Goals

  • Do not redesign the dashboard UI.
  • Do not implement a new leaderboard model.
  • Do not require a broad DB migration unless review_records is insufficient.
  • Do not make production-readiness claims from health-check summaries alone.

Current Implementation Audit

Current truth:

  • lib/db.py already has review_records with pr_number, domain, agent, reviewer, reviewer_model, outcome, rejection_reason, and notes.
  • lib/contributor.py assumes Leo reviews every PR and credits Leo plus one domain_agent.
  • lib/health.py computes approval rates from domain_verdict and leo_verdict.
  • lib/health.py builds reviewer strings only from domain_verdict and leo_verdict.
  • pipeline-health-check.py can parse arbitrary VERDICT:AGENT:* tags, but it has no required-agent concept.
  • A cross-domain PR with one approval and one missing required review could be misclassified if reporting only checks "any approve".

Existing-Spec Inventory

Existing doc Relevance Decision
docs/phase1b-agent-routing-spec.md Parent route/verdict state. Reuse.
docs/ARCHITECTURE.md Health/dashboard baseline. Reuse as context.
docs/DIAGNOSTICS-AGENT-SPEC.md Diagnostics philosophy. Reuse as later direction, not immediate scope.

Goal-Vs-Repo-Truth Diff

Goal:

  • Required-agent state is visible enough to avoid false readiness.
  • Contributor evaluator credit follows actual approved reviewer agents.
  • Health and pipeline checks can distinguish incomplete cross-domain review.

Repo truth:

  • Legacy fields only represent domain_verdict plus leo_verdict.
  • Contributor credit hardcodes Leo as universal reviewer.
  • pipeline-health-check.py parses comments but does not know required reviewers.

Completion Percent And Remaining Delta

Current completion: 10 percent because review_records already exists.

Remaining delta:

  1. Ensure eval integration writes one review_records row per required reviewer.
  2. Update contributor attribution to prefer approved review_records.
  3. Keep legacy fields as projection only.
  4. Add optional route marker parsing to pipeline-health-check.py.
  5. Add tests proving no partial-review false readiness.

Closure, Endpoint, And Deployment Truth

Local closure:

  • Tests prove contributor credit and stage classification respect required reviewers.

Staging closure:

  • Staging proof artifact and health readback agree on required-agent completion.

Production closure:

  • Production health does not show PRs as ready before all required agents approve.

Critical Assumptions And Invalidators

Assumptions:

  • review_records is available in production DB schema.
  • Eval integration can write review_records for each required reviewer.
  • Dashboards can tolerate legacy projections during Phase 1b.

Invalidators:

  • Production DB lacks review_records.
  • Contributor code path cannot query review_records without performance issues.
  • Branch protection or merge logic uses legacy fields directly for readiness.

State And Truth Contract

review_records becomes the compatibility source for per-agent reviewer history.

Required eval write:

one review_records row per required reviewer per PR attempt

Legacy projection:

  • domain_agent = primary_agent
  • domain_verdict = aggregate_verdict
  • leo_verdict = actual Leo verdict when Leo is required, else skipped

Route/audit JSON remains the source for required_agents.

Measurement Contract

Minimum compatibility metrics:

  • review_records_written_count
  • required_reviews_missing_count
  • partial_review_not_ready_count
  • contributor_evaluator_credit_count_by_agent

Minimum proof:

  • A two-agent PR with one approval and one missing verdict is not classified as ready.
  • A two-agent PR with two approvals is classified as ready.
  • Contributor credit includes both approved reviewers.

Backend Work Required

Owned files:

  • lib/contributor.py
  • lib/health.py
  • pipeline-health-check.py
  • tests/test_contributor.py or new focused test.
  • tests/test_pipeline_health_phase1b.py if added.

Implementation steps:

  1. Confirm review_records exists in local schema and migrations.
  2. Update eval integration spec to write review records per required reviewer.
  3. Update contributor credit to prefer approved review_records.reviewer rows.
  4. Fall back to legacy leo_verdict and domain_verdict for old data.
  5. Update health output to include review records or route audit fields where available.
  6. Update pipeline health check to read required-agent markers if present.
  7. Add tests.

Forbidden work:

  • Dashboard redesign.
  • New leaderboard model.
  • Broad schema migration before proof requires it.

Frontend Work Required

None.

Expected Runtime And User-Visible Behavior

Operators should see:

  • Per-agent reviewer outcomes when available.
  • Cross-domain PRs not marked ready until all required reviewers approve.
  • Contributor credit reflecting actual approved reviewer agents.

Existing dashboard layout can remain unchanged if data is honest.

Validation And Test Matrix

Commands:

python3 -m pytest tests/test_contributor.py tests/test_pipeline_health_phase1b.py
python3 -m ruff check lib/contributor.py lib/health.py pipeline-health-check.py tests
git diff --check

Test cases:

  • old data fallback credits Leo/domain reviewer.
  • new review_records data credits all approved required reviewers.
  • request-changes reviewer receives no evaluator credit.
  • one missing required reviewer blocks ready classification.
  • all required reviewers approve enables ready classification.

CI/CD, Release, And Pre-Push Gate Contract

Before PR:

  • Compatibility tests pass or are documented as not runnable due missing dev deps.

Before staging:

  • Staging proof includes health and contributor-readback commands.

Before production:

  • Operator verifies no partial-review false readiness in logs/health readback.

Independent CLI Audit Contract

Reviewer commands:

rg -n "Leo reviews every PR|leo_verdict|domain_verdict|review_records|required_agents|VERDICT" lib pipeline-health-check.py tests
sqlite3 /path/to/pipeline.db ".schema review_records"

Reviewer checks:

  • review_records is preferred for new evaluator credit.
  • Legacy fallback remains for old rows.
  • Health does not rely on any-approve for multi-review readiness.

Outside-The-Box Fix Paths

If review_records is insufficient:

  • Add additive route_json and agent_verdicts_json columns to prs.

If pipeline-health-check.py cannot read route markers:

  • Treat cross-domain PRs as awaiting review unless all verdict tags expected by route artifact are present.

If contributor credit is too risky for Phase 1b:

  • Defer credit mutation and emit review-record-only proof until after eval stability.

Maintenance Capture

Beneficial now:

  • Replace comments claiming "Leo reviews every PR."
  • Add focused tests for the compatibility projection.

Avoid now:

  • Dashboard UI rewrite.
  • Historical backfill.
  • Leaderboard redesign.

Parallelization And Fanout

Classification: ready_now after eval integration establishes review record writes.

Worker-ready prompt:

make reporting and contributor attribution phase 1b-compatible. prefer review_records for new evaluator credit, preserve legacy fallback, and prevent health/pipeline checks from marking cross-domain prs ready before all required agents approve. do not redesign dashboards or add broad schema migrations unless tests prove necessary.

Acceptance Criteria

  • No code path claims Leo reviews every new Phase 1b PR.
  • Approved review_records can credit all required reviewer agents.
  • Health/check logic avoids partial-review false readiness.
  • Legacy data still renders.

Readiness And Claim Boundaries

Allowed claim:

  • "Reporting compatibility is updated to avoid false readiness and credit loss."

Forbidden claim:

  • "Dashboards are redesigned for Phase 1b."

Spec Quality Self-Audit

All required execution-grade headings are present. This spec is intentionally compatibility-scoped and does not attempt a full reporting product redesign.

Assistant-Added Caveats

The safest first move is to write accurate review_records and route audit JSON. Rich dashboards should wait until production behavior proves stable.