# Phase 1b Child Spec: Reporting And Contributor Compatibility Created: 2026-05-29 Status: active draft Parent spec: `docs/phase1b-agent-routing-spec.md` ## Product Outcome Contract Phase 1b must not make dashboards, health checks, or contributor credit lie about review state. Reporting may stay minimal, but it must not mark a cross-domain PR as ready before all required agents have reviewed. ## Goal Update compatibility surfaces so Phase 1b required-agent reviews are represented accurately enough for operations, health, and contributor attribution without doing a dashboard redesign. ## Non-Goals - Do not redesign the dashboard UI. - Do not implement a new leaderboard model. - Do not require a broad DB migration unless `review_records` is insufficient. - Do not make production-readiness claims from health-check summaries alone. ## Current Implementation Audit Current truth: - `lib/db.py` already has `review_records` with `pr_number`, `domain`, `agent`, `reviewer`, `reviewer_model`, `outcome`, `rejection_reason`, and `notes`. - `lib/contributor.py` assumes Leo reviews every PR and credits Leo plus one `domain_agent`. - `lib/health.py` computes approval rates from `domain_verdict` and `leo_verdict`. - `lib/health.py` builds reviewer strings only from `domain_verdict` and `leo_verdict`. - `pipeline-health-check.py` can parse arbitrary `VERDICT:AGENT:*` tags, but it has no required-agent concept. - A cross-domain PR with one approval and one missing required review could be misclassified if reporting only checks "any approve". ## Existing-Spec Inventory | Existing doc | Relevance | Decision | | --- | --- | --- | | `docs/phase1b-agent-routing-spec.md` | Parent route/verdict state. | Reuse. | | `docs/ARCHITECTURE.md` | Health/dashboard baseline. | Reuse as context. | | `docs/DIAGNOSTICS-AGENT-SPEC.md` | Diagnostics philosophy. | Reuse as later direction, not immediate scope. | ## Goal-Vs-Repo-Truth Diff Goal: - Required-agent state is visible enough to avoid false readiness. - Contributor evaluator credit follows actual approved reviewer agents. - Health and pipeline checks can distinguish incomplete cross-domain review. Repo truth: - Legacy fields only represent `domain_verdict` plus `leo_verdict`. - Contributor credit hardcodes Leo as universal reviewer. - `pipeline-health-check.py` parses comments but does not know required reviewers. ## Completion Percent And Remaining Delta Current completion: 10 percent because `review_records` already exists. Remaining delta: 1. Ensure eval integration writes one `review_records` row per required reviewer. 2. Update contributor attribution to prefer approved `review_records`. 3. Keep legacy fields as projection only. 4. Add optional route marker parsing to `pipeline-health-check.py`. 5. Add tests proving no partial-review false readiness. ## Closure, Endpoint, And Deployment Truth Local closure: - Tests prove contributor credit and stage classification respect required reviewers. Staging closure: - Staging proof artifact and health readback agree on required-agent completion. Production closure: - Production health does not show PRs as ready before all required agents approve. ## Critical Assumptions And Invalidators Assumptions: - `review_records` is available in production DB schema. - Eval integration can write `review_records` for each required reviewer. - Dashboards can tolerate legacy projections during Phase 1b. Invalidators: - Production DB lacks `review_records`. - Contributor code path cannot query `review_records` without performance issues. - Branch protection or merge logic uses legacy fields directly for readiness. ## State And Truth Contract `review_records` becomes the compatibility source for per-agent reviewer history. Required eval write: ```text one review_records row per required reviewer per PR attempt ``` Legacy projection: - `domain_agent = primary_agent` - `domain_verdict = aggregate_verdict` - `leo_verdict = actual Leo verdict when Leo is required, else skipped` Route/audit JSON remains the source for `required_agents`. ## Measurement Contract Minimum compatibility metrics: - `review_records_written_count` - `required_reviews_missing_count` - `partial_review_not_ready_count` - `contributor_evaluator_credit_count_by_agent` Minimum proof: - A two-agent PR with one approval and one missing verdict is not classified as ready. - A two-agent PR with two approvals is classified as ready. - Contributor credit includes both approved reviewers. ## Backend Work Required Owned files: - `lib/contributor.py` - `lib/health.py` - `pipeline-health-check.py` - `tests/test_contributor.py` or new focused test. - `tests/test_pipeline_health_phase1b.py` if added. Implementation steps: 1. Confirm `review_records` exists in local schema and migrations. 2. Update eval integration spec to write review records per required reviewer. 3. Update contributor credit to prefer approved `review_records.reviewer` rows. 4. Fall back to legacy `leo_verdict` and `domain_verdict` for old data. 5. Update health output to include review records or route audit fields where available. 6. Update pipeline health check to read required-agent markers if present. 7. Add tests. Forbidden work: - Dashboard redesign. - New leaderboard model. - Broad schema migration before proof requires it. ## Frontend Work Required None. ## Expected Runtime And User-Visible Behavior Operators should see: - Per-agent reviewer outcomes when available. - Cross-domain PRs not marked ready until all required reviewers approve. - Contributor credit reflecting actual approved reviewer agents. Existing dashboard layout can remain unchanged if data is honest. ## Validation And Test Matrix Commands: ```bash python3 -m pytest tests/test_contributor.py tests/test_pipeline_health_phase1b.py python3 -m ruff check lib/contributor.py lib/health.py pipeline-health-check.py tests git diff --check ``` Test cases: - old data fallback credits Leo/domain reviewer. - new `review_records` data credits all approved required reviewers. - request-changes reviewer receives no evaluator credit. - one missing required reviewer blocks ready classification. - all required reviewers approve enables ready classification. ## CI/CD, Release, And Pre-Push Gate Contract Before PR: - Compatibility tests pass or are documented as not runnable due missing dev deps. Before staging: - Staging proof includes health and contributor-readback commands. Before production: - Operator verifies no partial-review false readiness in logs/health readback. ## Independent CLI Audit Contract Reviewer commands: ```bash rg -n "Leo reviews every PR|leo_verdict|domain_verdict|review_records|required_agents|VERDICT" lib pipeline-health-check.py tests sqlite3 /path/to/pipeline.db ".schema review_records" ``` Reviewer checks: - `review_records` is preferred for new evaluator credit. - Legacy fallback remains for old rows. - Health does not rely on any-approve for multi-review readiness. ## Outside-The-Box Fix Paths If `review_records` is insufficient: - Add additive `route_json` and `agent_verdicts_json` columns to `prs`. If `pipeline-health-check.py` cannot read route markers: - Treat cross-domain PRs as awaiting review unless all verdict tags expected by route artifact are present. If contributor credit is too risky for Phase 1b: - Defer credit mutation and emit review-record-only proof until after eval stability. ## Maintenance Capture Beneficial now: - Replace comments claiming "Leo reviews every PR." - Add focused tests for the compatibility projection. Avoid now: - Dashboard UI rewrite. - Historical backfill. - Leaderboard redesign. ## Parallelization And Fanout Classification: ready_now after eval integration establishes review record writes. Worker-ready prompt: ```text make reporting and contributor attribution phase 1b-compatible. prefer review_records for new evaluator credit, preserve legacy fallback, and prevent health/pipeline checks from marking cross-domain prs ready before all required agents approve. do not redesign dashboards or add broad schema migrations unless tests prove necessary. ``` ## Acceptance Criteria - No code path claims Leo reviews every new Phase 1b PR. - Approved `review_records` can credit all required reviewer agents. - Health/check logic avoids partial-review false readiness. - Legacy data still renders. ## Readiness And Claim Boundaries Allowed claim: - "Reporting compatibility is updated to avoid false readiness and credit loss." Forbidden claim: - "Dashboards are redesigned for Phase 1b." ## Spec Quality Self-Audit All required execution-grade headings are present. This spec is intentionally compatibility-scoped and does not attempt a full reporting product redesign. ## Assistant-Added Caveats The safest first move is to write accurate `review_records` and route audit JSON. Rich dashboards should wait until production behavior proves stable.