60 lines
2.7 KiB
Markdown
60 lines
2.7 KiB
Markdown
# Repo Audit Repair Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Repair the audit findings that affect measurement integrity, state correctness, documentation accuracy, and open-source readiness.
|
|
|
|
**Architecture:** Keep changes localized to the existing stdlib-only Python package. Measurement validation lives at the HTTP/worker boundary, study state fixes remain in `StudyStore`, compare reporting gains explicit failed/no-feasible accounting, and project metadata/docs are added at repo root.
|
|
|
|
**Tech Stack:** Python 3.11+ stdlib, `unittest`, setuptools `pyproject.toml`.
|
|
|
|
---
|
|
|
|
### Task 1: Measurement Integrity
|
|
|
|
**Files:**
|
|
- Modify: `src/aituner/http_client.py`
|
|
- Modify: `src/aituner/slo.py`
|
|
- Modify: `src/aituner/worker.py`
|
|
- Test: `tests/test_core_flow.py`
|
|
|
|
- [ ] Write failing tests for completion token source/mismatch failures and persisted per-request probe details.
|
|
- [ ] Run the targeted tests and confirm they fail for the expected reason.
|
|
- [ ] Add token source metadata to streamed metrics and request outcomes.
|
|
- [ ] Fail requests when configured completion length cannot be verified from usage or differs from expected.
|
|
- [ ] Persist probe outcome details under each trial artifact directory.
|
|
- [ ] Run targeted tests and the full unittest suite.
|
|
|
|
### Task 2: State, Spec, And Compare Guards
|
|
|
|
**Files:**
|
|
- Modify: `src/aituner/spec.py`
|
|
- Modify: `src/aituner/store.py`
|
|
- Modify: `src/aituner/compare.py`
|
|
- Modify: `scripts/run_multi_compare.py`
|
|
- Test: `tests/test_core_flow.py`
|
|
|
|
- [ ] Write failing tests for state list isolation, invalid trace numeric bounds, and compare aggregate failure accounting.
|
|
- [ ] Run targeted tests and confirm expected failures.
|
|
- [ ] Deep-copy/replace trial lists when materializing trials.
|
|
- [ ] Validate positive trace controls in `TraceSpec.from_dict`.
|
|
- [ ] Report failed/no-feasible counts in compare aggregates without changing existing winner semantics.
|
|
- [ ] Run targeted tests and the full unittest suite.
|
|
|
|
### Task 3: Docs And Open-Source Readiness
|
|
|
|
**Files:**
|
|
- Create: `README.md`
|
|
- Create: `LICENSE`
|
|
- Create: `CONTRIBUTING.md`
|
|
- Create: `SECURITY.md`
|
|
- Modify: `pyproject.toml`
|
|
- Modify: selected docs under `docs/`
|
|
|
|
- [ ] Add concise repo usage, verification, and experiment integrity guidance.
|
|
- [ ] Add MIT license and contribution/security notes.
|
|
- [ ] Add project metadata and optional test extra.
|
|
- [ ] Update stale docs about high-stop behavior and current test count.
|
|
- [ ] Run JSON validation and full unittest suite.
|
|
- [ ] Commit changes in logical groups.
|