teleo-infrastructure/docs/phase1b/reporting-contributor-compatibility-spec.md

# Phase 1b Child Spec: Reporting And Contributor Compatibility

Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`

## Product Outcome Contract

Phase 1b must not make dashboards, health checks, or contributor credit lie about review state. Reporting may stay minimal, but it must not mark a cross-domain PR as ready before all required agents have reviewed.

## Goal

Update compatibility surfaces so Phase 1b required-agent reviews are represented accurately enough for operations, health, and contributor attribution without doing a dashboard redesign.

## Non-Goals

- Do not redesign the dashboard UI.
- Do not implement a new leaderboard model.
- Do not require a broad DB migration unless `review_records` is insufficient.
- Do not make production-readiness claims from health-check summaries alone.

## Current Implementation Audit

Current truth:

- `lib/db.py` already has `review_records` with `pr_number`, `domain`, `agent`, `reviewer`, `reviewer_model`, `outcome`, `rejection_reason`, and `notes`.
- `lib/contributor.py` assumes Leo reviews every PR and credits Leo plus one `domain_agent`.
- `lib/health.py` computes approval rates from `domain_verdict` and `leo_verdict`.
- `lib/health.py` builds reviewer strings only from `domain_verdict` and `leo_verdict`.
- `pipeline-health-check.py` can parse arbitrary `VERDICT:AGENT:*` tags, but it has no required-agent concept.
- A cross-domain PR with one approval and one missing required review could be misclassified if reporting only checks "any approve".

## Existing-Spec Inventory

| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Parent route/verdict state. | Reuse. |
| `docs/ARCHITECTURE.md` | Health/dashboard baseline. | Reuse as context. |
| `docs/DIAGNOSTICS-AGENT-SPEC.md` | Diagnostics philosophy. | Reuse as later direction, not immediate scope. |

## Goal-Vs-Repo-Truth Diff

Goal:

- Required-agent state is visible enough to avoid false readiness.
- Contributor evaluator credit follows actual approved reviewer agents.
- Health and pipeline checks can distinguish incomplete cross-domain review.

Repo truth:

- Legacy fields only represent `domain_verdict` plus `leo_verdict`.
- Contributor credit hardcodes Leo as universal reviewer.
- `pipeline-health-check.py` parses comments but does not know required reviewers.

## Completion Percent And Remaining Delta

Current completion: 10 percent because `review_records` already exists.

Remaining delta:

1. Ensure eval integration writes one `review_records` row per required reviewer.
2. Update contributor attribution to prefer approved `review_records`.
3. Keep legacy fields as projection only.
4. Add optional route marker parsing to `pipeline-health-check.py`.
5. Add tests proving no partial-review false readiness.

## Closure, Endpoint, And Deployment Truth

Local closure:

- Tests prove contributor credit and stage classification respect required reviewers.

Staging closure:

- Staging proof artifact and health readback agree on required-agent completion.

Production closure:

- Production health does not show PRs as ready before all required agents approve.

## Critical Assumptions And Invalidators

Assumptions:

- `review_records` is available in production DB schema.
- Eval integration can write `review_records` for each required reviewer.
- Dashboards can tolerate legacy projections during Phase 1b.

Invalidators:

- Production DB lacks `review_records`.
- Contributor code path cannot query `review_records` without performance issues.
- Branch protection or merge logic uses legacy fields directly for readiness.

## State And Truth Contract

`review_records` becomes the compatibility source for per-agent reviewer history.

Required eval write:

```text
one review_records row per required reviewer per PR attempt
```

Legacy projection:

- `domain_agent = primary_agent`
- `domain_verdict = aggregate_verdict`
- `leo_verdict = actual Leo verdict when Leo is required, else skipped`

Route/audit JSON remains the source for `required_agents`.

## Measurement Contract

Minimum compatibility metrics:

- `review_records_written_count`
- `required_reviews_missing_count`
- `partial_review_not_ready_count`
- `contributor_evaluator_credit_count_by_agent`

Minimum proof:

- A two-agent PR with one approval and one missing verdict is not classified as ready.
- A two-agent PR with two approvals is classified as ready.
- Contributor credit includes both approved reviewers.

## Backend Work Required

Owned files:

- `lib/contributor.py`
- `lib/health.py`
- `pipeline-health-check.py`
- `tests/test_contributor.py` or new focused test.
- `tests/test_pipeline_health_phase1b.py` if added.

Implementation steps:

1. Confirm `review_records` exists in local schema and migrations.
2. Update eval integration spec to write review records per required reviewer.
3. Update contributor credit to prefer approved `review_records.reviewer` rows.
4. Fall back to legacy `leo_verdict` and `domain_verdict` for old data.
5. Update health output to include review records or route audit fields where available.
6. Update pipeline health check to read required-agent markers if present.
7. Add tests.

Forbidden work:

- Dashboard redesign.
- New leaderboard model.
- Broad schema migration before proof requires it.

## Frontend Work Required

None.

## Expected Runtime And User-Visible Behavior

Operators should see:

- Per-agent reviewer outcomes when available.
- Cross-domain PRs not marked ready until all required reviewers approve.
- Contributor credit reflecting actual approved reviewer agents.

Existing dashboard layout can remain unchanged if data is honest.

## Validation And Test Matrix

Commands:

```bash
python3 -m pytest tests/test_contributor.py tests/test_pipeline_health_phase1b.py
python3 -m ruff check lib/contributor.py lib/health.py pipeline-health-check.py tests
git diff --check
```

Test cases:

- old data fallback credits Leo/domain reviewer.
- new `review_records` data credits all approved required reviewers.
- request-changes reviewer receives no evaluator credit.
- one missing required reviewer blocks ready classification.
- all required reviewers approve enables ready classification.

## CI/CD, Release, And Pre-Push Gate Contract

Before PR:

- Compatibility tests pass or are documented as not runnable due missing dev deps.

Before staging:

- Staging proof includes health and contributor-readback commands.

Before production:

- Operator verifies no partial-review false readiness in logs/health readback.

## Independent CLI Audit Contract

Reviewer commands:

```bash
rg -n "Leo reviews every PR|leo_verdict|domain_verdict|review_records|required_agents|VERDICT" lib pipeline-health-check.py tests
sqlite3 /path/to/pipeline.db ".schema review_records"
```

Reviewer checks:

- `review_records` is preferred for new evaluator credit.
- Legacy fallback remains for old rows.
- Health does not rely on any-approve for multi-review readiness.

## Outside-The-Box Fix Paths

If `review_records` is insufficient:

- Add additive `route_json` and `agent_verdicts_json` columns to `prs`.

If `pipeline-health-check.py` cannot read route markers:

- Treat cross-domain PRs as awaiting review unless all verdict tags expected by route artifact are present.

If contributor credit is too risky for Phase 1b:

- Defer credit mutation and emit review-record-only proof until after eval stability.

## Maintenance Capture

Beneficial now:

- Replace comments claiming "Leo reviews every PR."
- Add focused tests for the compatibility projection.

Avoid now:

- Dashboard UI rewrite.
- Historical backfill.
- Leaderboard redesign.

## Parallelization And Fanout

Classification: ready_now after eval integration establishes review record writes.

Worker-ready prompt:

```text
make reporting and contributor attribution phase 1b-compatible. prefer review_records for new evaluator credit, preserve legacy fallback, and prevent health/pipeline checks from marking cross-domain prs ready before all required agents approve. do not redesign dashboards or add broad schema migrations unless tests prove necessary.
```

## Acceptance Criteria

- No code path claims Leo reviews every new Phase 1b PR.
- Approved `review_records` can credit all required reviewer agents.
- Health/check logic avoids partial-review false readiness.
- Legacy data still renders.

## Readiness And Claim Boundaries

Allowed claim:

- "Reporting compatibility is updated to avoid false readiness and credit loss."

Forbidden claim:

- "Dashboards are redesigned for Phase 1b."

## Spec Quality Self-Audit

All required execution-grade headings are present. This spec is intentionally compatibility-scoped and does not attempt a full reporting product redesign.

## Assistant-Added Caveats

The safest first move is to write accurate `review_records` and route audit JSON. Rich dashboards should wait until production behavior proves stable.