8.8 KiB
Phase 1b Child Spec: Reporting And Contributor Compatibility
Created: 2026-05-29
Status: active draft
Parent spec: docs/phase1b-agent-routing-spec.md
Product Outcome Contract
Phase 1b must not make dashboards, health checks, or contributor credit lie about review state. Reporting may stay minimal, but it must not mark a cross-domain PR as ready before all required agents have reviewed.
Goal
Update compatibility surfaces so Phase 1b required-agent reviews are represented accurately enough for operations, health, and contributor attribution without doing a dashboard redesign.
Non-Goals
- Do not redesign the dashboard UI.
- Do not implement a new leaderboard model.
- Do not require a broad DB migration unless
review_recordsis insufficient. - Do not make production-readiness claims from health-check summaries alone.
Current Implementation Audit
Current truth:
lib/db.pyalready hasreview_recordswithpr_number,domain,agent,reviewer,reviewer_model,outcome,rejection_reason, andnotes.lib/contributor.pyassumes Leo reviews every PR and credits Leo plus onedomain_agent.lib/health.pycomputes approval rates fromdomain_verdictandleo_verdict.lib/health.pybuilds reviewer strings only fromdomain_verdictandleo_verdict.pipeline-health-check.pycan parse arbitraryVERDICT:AGENT:*tags, but it has no required-agent concept.- A cross-domain PR with one approval and one missing required review could be misclassified if reporting only checks "any approve".
Existing-Spec Inventory
| Existing doc | Relevance | Decision |
|---|---|---|
docs/phase1b-agent-routing-spec.md |
Parent route/verdict state. | Reuse. |
docs/ARCHITECTURE.md |
Health/dashboard baseline. | Reuse as context. |
docs/DIAGNOSTICS-AGENT-SPEC.md |
Diagnostics philosophy. | Reuse as later direction, not immediate scope. |
Goal-Vs-Repo-Truth Diff
Goal:
- Required-agent state is visible enough to avoid false readiness.
- Contributor evaluator credit follows actual approved reviewer agents.
- Health and pipeline checks can distinguish incomplete cross-domain review.
Repo truth:
- Legacy fields only represent
domain_verdictplusleo_verdict. - Contributor credit hardcodes Leo as universal reviewer.
pipeline-health-check.pyparses comments but does not know required reviewers.
Completion Percent And Remaining Delta
Current completion: 10 percent because review_records already exists.
Remaining delta:
- Ensure eval integration writes one
review_recordsrow per required reviewer. - Update contributor attribution to prefer approved
review_records. - Keep legacy fields as projection only.
- Add optional route marker parsing to
pipeline-health-check.py. - Add tests proving no partial-review false readiness.
Closure, Endpoint, And Deployment Truth
Local closure:
- Tests prove contributor credit and stage classification respect required reviewers.
Staging closure:
- Staging proof artifact and health readback agree on required-agent completion.
Production closure:
- Production health does not show PRs as ready before all required agents approve.
Critical Assumptions And Invalidators
Assumptions:
review_recordsis available in production DB schema.- Eval integration can write
review_recordsfor each required reviewer. - Dashboards can tolerate legacy projections during Phase 1b.
Invalidators:
- Production DB lacks
review_records. - Contributor code path cannot query
review_recordswithout performance issues. - Branch protection or merge logic uses legacy fields directly for readiness.
State And Truth Contract
review_records becomes the compatibility source for per-agent reviewer history.
Required eval write:
one review_records row per required reviewer per PR attempt
Legacy projection:
domain_agent = primary_agentdomain_verdict = aggregate_verdictleo_verdict = actual Leo verdict when Leo is required, else skipped
Route/audit JSON remains the source for required_agents.
Measurement Contract
Minimum compatibility metrics:
review_records_written_countrequired_reviews_missing_countpartial_review_not_ready_countcontributor_evaluator_credit_count_by_agent
Minimum proof:
- A two-agent PR with one approval and one missing verdict is not classified as ready.
- A two-agent PR with two approvals is classified as ready.
- Contributor credit includes both approved reviewers.
Backend Work Required
Owned files:
lib/contributor.pylib/health.pypipeline-health-check.pytests/test_contributor.pyor new focused test.tests/test_pipeline_health_phase1b.pyif added.
Implementation steps:
- Confirm
review_recordsexists in local schema and migrations. - Update eval integration spec to write review records per required reviewer.
- Update contributor credit to prefer approved
review_records.reviewerrows. - Fall back to legacy
leo_verdictanddomain_verdictfor old data. - Update health output to include review records or route audit fields where available.
- Update pipeline health check to read required-agent markers if present.
- Add tests.
Forbidden work:
- Dashboard redesign.
- New leaderboard model.
- Broad schema migration before proof requires it.
Frontend Work Required
None.
Expected Runtime And User-Visible Behavior
Operators should see:
- Per-agent reviewer outcomes when available.
- Cross-domain PRs not marked ready until all required reviewers approve.
- Contributor credit reflecting actual approved reviewer agents.
Existing dashboard layout can remain unchanged if data is honest.
Validation And Test Matrix
Commands:
python3 -m pytest tests/test_contributor.py tests/test_pipeline_health_phase1b.py
python3 -m ruff check lib/contributor.py lib/health.py pipeline-health-check.py tests
git diff --check
Test cases:
- old data fallback credits Leo/domain reviewer.
- new
review_recordsdata credits all approved required reviewers. - request-changes reviewer receives no evaluator credit.
- one missing required reviewer blocks ready classification.
- all required reviewers approve enables ready classification.
CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Compatibility tests pass or are documented as not runnable due missing dev deps.
Before staging:
- Staging proof includes health and contributor-readback commands.
Before production:
- Operator verifies no partial-review false readiness in logs/health readback.
Independent CLI Audit Contract
Reviewer commands:
rg -n "Leo reviews every PR|leo_verdict|domain_verdict|review_records|required_agents|VERDICT" lib pipeline-health-check.py tests
sqlite3 /path/to/pipeline.db ".schema review_records"
Reviewer checks:
review_recordsis preferred for new evaluator credit.- Legacy fallback remains for old rows.
- Health does not rely on any-approve for multi-review readiness.
Outside-The-Box Fix Paths
If review_records is insufficient:
- Add additive
route_jsonandagent_verdicts_jsoncolumns toprs.
If pipeline-health-check.py cannot read route markers:
- Treat cross-domain PRs as awaiting review unless all verdict tags expected by route artifact are present.
If contributor credit is too risky for Phase 1b:
- Defer credit mutation and emit review-record-only proof until after eval stability.
Maintenance Capture
Beneficial now:
- Replace comments claiming "Leo reviews every PR."
- Add focused tests for the compatibility projection.
Avoid now:
- Dashboard UI rewrite.
- Historical backfill.
- Leaderboard redesign.
Parallelization And Fanout
Classification: ready_now after eval integration establishes review record writes.
Worker-ready prompt:
make reporting and contributor attribution phase 1b-compatible. prefer review_records for new evaluator credit, preserve legacy fallback, and prevent health/pipeline checks from marking cross-domain prs ready before all required agents approve. do not redesign dashboards or add broad schema migrations unless tests prove necessary.
Acceptance Criteria
- No code path claims Leo reviews every new Phase 1b PR.
- Approved
review_recordscan credit all required reviewer agents. - Health/check logic avoids partial-review false readiness.
- Legacy data still renders.
Readiness And Claim Boundaries
Allowed claim:
- "Reporting compatibility is updated to avoid false readiness and credit loss."
Forbidden claim:
- "Dashboards are redesigned for Phase 1b."
Spec Quality Self-Audit
All required execution-grade headings are present. This spec is intentionally compatibility-scoped and does not attempt a full reporting product redesign.
Assistant-Added Caveats
The safest first move is to write accurate review_records and route audit JSON. Rich dashboards should wait until production behavior proves stable.