teleo-infrastructure/docs/phase1b/reporting-contributor-compatibility-spec.md
2026-05-29 14:00:13 +02:00

275 lines
8.8 KiB
Markdown

# Phase 1b Child Spec: Reporting And Contributor Compatibility
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
Phase 1b must not make dashboards, health checks, or contributor credit lie about review state. Reporting may stay minimal, but it must not mark a cross-domain PR as ready before all required agents have reviewed.
## Goal
Update compatibility surfaces so Phase 1b required-agent reviews are represented accurately enough for operations, health, and contributor attribution without doing a dashboard redesign.
## Non-Goals
- Do not redesign the dashboard UI.
- Do not implement a new leaderboard model.
- Do not require a broad DB migration unless `review_records` is insufficient.
- Do not make production-readiness claims from health-check summaries alone.
## Current Implementation Audit
Current truth:
- `lib/db.py` already has `review_records` with `pr_number`, `domain`, `agent`, `reviewer`, `reviewer_model`, `outcome`, `rejection_reason`, and `notes`.
- `lib/contributor.py` assumes Leo reviews every PR and credits Leo plus one `domain_agent`.
- `lib/health.py` computes approval rates from `domain_verdict` and `leo_verdict`.
- `lib/health.py` builds reviewer strings only from `domain_verdict` and `leo_verdict`.
- `pipeline-health-check.py` can parse arbitrary `VERDICT:AGENT:*` tags, but it has no required-agent concept.
- A cross-domain PR with one approval and one missing required review could be misclassified if reporting only checks "any approve".
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Parent route/verdict state. | Reuse. |
| `docs/ARCHITECTURE.md` | Health/dashboard baseline. | Reuse as context. |
| `docs/DIAGNOSTICS-AGENT-SPEC.md` | Diagnostics philosophy. | Reuse as later direction, not immediate scope. |
## Goal-Vs-Repo-Truth Diff
Goal:
- Required-agent state is visible enough to avoid false readiness.
- Contributor evaluator credit follows actual approved reviewer agents.
- Health and pipeline checks can distinguish incomplete cross-domain review.
Repo truth:
- Legacy fields only represent `domain_verdict` plus `leo_verdict`.
- Contributor credit hardcodes Leo as universal reviewer.
- `pipeline-health-check.py` parses comments but does not know required reviewers.
## Completion Percent And Remaining Delta
Current completion: 10 percent because `review_records` already exists.
Remaining delta:
1. Ensure eval integration writes one `review_records` row per required reviewer.
2. Update contributor attribution to prefer approved `review_records`.
3. Keep legacy fields as projection only.
4. Add optional route marker parsing to `pipeline-health-check.py`.
5. Add tests proving no partial-review false readiness.
## Closure, Endpoint, And Deployment Truth
Local closure:
- Tests prove contributor credit and stage classification respect required reviewers.
Staging closure:
- Staging proof artifact and health readback agree on required-agent completion.
Production closure:
- Production health does not show PRs as ready before all required agents approve.
## Critical Assumptions And Invalidators
Assumptions:
- `review_records` is available in production DB schema.
- Eval integration can write `review_records` for each required reviewer.
- Dashboards can tolerate legacy projections during Phase 1b.
Invalidators:
- Production DB lacks `review_records`.
- Contributor code path cannot query `review_records` without performance issues.
- Branch protection or merge logic uses legacy fields directly for readiness.
## State And Truth Contract
`review_records` becomes the compatibility source for per-agent reviewer history.
Required eval write:
```text
one review_records row per required reviewer per PR attempt
```
Legacy projection:
- `domain_agent = primary_agent`
- `domain_verdict = aggregate_verdict`
- `leo_verdict = actual Leo verdict when Leo is required, else skipped`
Route/audit JSON remains the source for `required_agents`.
## Measurement Contract
Minimum compatibility metrics:
- `review_records_written_count`
- `required_reviews_missing_count`
- `partial_review_not_ready_count`
- `contributor_evaluator_credit_count_by_agent`
Minimum proof:
- A two-agent PR with one approval and one missing verdict is not classified as ready.
- A two-agent PR with two approvals is classified as ready.
- Contributor credit includes both approved reviewers.
## Backend Work Required
Owned files:
- `lib/contributor.py`
- `lib/health.py`
- `pipeline-health-check.py`
- `tests/test_contributor.py` or new focused test.
- `tests/test_pipeline_health_phase1b.py` if added.
Implementation steps:
1. Confirm `review_records` exists in local schema and migrations.
2. Update eval integration spec to write review records per required reviewer.
3. Update contributor credit to prefer approved `review_records.reviewer` rows.
4. Fall back to legacy `leo_verdict` and `domain_verdict` for old data.
5. Update health output to include review records or route audit fields where available.
6. Update pipeline health check to read required-agent markers if present.
7. Add tests.
Forbidden work:
- Dashboard redesign.
- New leaderboard model.
- Broad schema migration before proof requires it.
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
Operators should see:
- Per-agent reviewer outcomes when available.
- Cross-domain PRs not marked ready until all required reviewers approve.
- Contributor credit reflecting actual approved reviewer agents.
Existing dashboard layout can remain unchanged if data is honest.
## Validation And Test Matrix
Commands:
```bash
python3 -m pytest tests/test_contributor.py tests/test_pipeline_health_phase1b.py
python3 -m ruff check lib/contributor.py lib/health.py pipeline-health-check.py tests
git diff --check
```
Test cases:
- old data fallback credits Leo/domain reviewer.
- new `review_records` data credits all approved required reviewers.
- request-changes reviewer receives no evaluator credit.
- one missing required reviewer blocks ready classification.
- all required reviewers approve enables ready classification.
## CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Compatibility tests pass or are documented as not runnable due missing dev deps.
Before staging:
- Staging proof includes health and contributor-readback commands.
Before production:
- Operator verifies no partial-review false readiness in logs/health readback.
## Independent CLI Audit Contract
Reviewer commands:
```bash
rg -n "Leo reviews every PR|leo_verdict|domain_verdict|review_records|required_agents|VERDICT" lib pipeline-health-check.py tests
sqlite3 /path/to/pipeline.db ".schema review_records"
```
Reviewer checks:
- `review_records` is preferred for new evaluator credit.
- Legacy fallback remains for old rows.
- Health does not rely on any-approve for multi-review readiness.
## Outside-The-Box Fix Paths
If `review_records` is insufficient:
- Add additive `route_json` and `agent_verdicts_json` columns to `prs`.
If `pipeline-health-check.py` cannot read route markers:
- Treat cross-domain PRs as awaiting review unless all verdict tags expected by route artifact are present.
If contributor credit is too risky for Phase 1b:
- Defer credit mutation and emit review-record-only proof until after eval stability.
## Maintenance Capture
Beneficial now:
- Replace comments claiming "Leo reviews every PR."
- Add focused tests for the compatibility projection.
Avoid now:
- Dashboard UI rewrite.
- Historical backfill.
- Leaderboard redesign.
## Parallelization And Fanout
Classification: ready_now after eval integration establishes review record writes.
Worker-ready prompt:
```text
make reporting and contributor attribution phase 1b-compatible. prefer review_records for new evaluator credit, preserve legacy fallback, and prevent health/pipeline checks from marking cross-domain prs ready before all required agents approve. do not redesign dashboards or add broad schema migrations unless tests prove necessary.
```
## Acceptance Criteria
- No code path claims Leo reviews every new Phase 1b PR.
- Approved `review_records` can credit all required reviewer agents.
- Health/check logic avoids partial-review false readiness.
- Legacy data still renders.
## Readiness And Claim Boundaries
Allowed claim:
- "Reporting compatibility is updated to avoid false readiness and credit loss."
Forbidden claim:
- "Dashboards are redesigned for Phase 1b."
## Spec Quality Self-Audit
All required execution-grade headings are present. This spec is intentionally compatibility-scoped and does not attempt a full reporting product redesign.
## Assistant-Added Caveats
The safest first move is to write accurate `review_records` and route audit JSON. Rich dashboards should wait until production behavior proves stable.