Files
aituner/docs/superpowers/plans/2026-05-06-repo-audit-repair.md
Gahow Wang d7df1ebdac
Some checks failed
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add open source project metadata
2026-05-06 21:18:21 +08:00

2.7 KiB

Repo Audit Repair Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Repair the audit findings that affect measurement integrity, state correctness, documentation accuracy, and open-source readiness.

Architecture: Keep changes localized to the existing stdlib-only Python package. Measurement validation lives at the HTTP/worker boundary, study state fixes remain in StudyStore, compare reporting gains explicit failed/no-feasible accounting, and project metadata/docs are added at repo root.

Tech Stack: Python 3.11+ stdlib, unittest, setuptools pyproject.toml.


Task 1: Measurement Integrity

Files:

  • Modify: src/aituner/http_client.py

  • Modify: src/aituner/slo.py

  • Modify: src/aituner/worker.py

  • Test: tests/test_core_flow.py

  • Write failing tests for completion token source/mismatch failures and persisted per-request probe details.

  • Run the targeted tests and confirm they fail for the expected reason.

  • Add token source metadata to streamed metrics and request outcomes.

  • Fail requests when configured completion length cannot be verified from usage or differs from expected.

  • Persist probe outcome details under each trial artifact directory.

  • Run targeted tests and the full unittest suite.

Task 2: State, Spec, And Compare Guards

Files:

  • Modify: src/aituner/spec.py

  • Modify: src/aituner/store.py

  • Modify: src/aituner/compare.py

  • Modify: scripts/run_multi_compare.py

  • Test: tests/test_core_flow.py

  • Write failing tests for state list isolation, invalid trace numeric bounds, and compare aggregate failure accounting.

  • Run targeted tests and confirm expected failures.

  • Deep-copy/replace trial lists when materializing trials.

  • Validate positive trace controls in TraceSpec.from_dict.

  • Report failed/no-feasible counts in compare aggregates without changing existing winner semantics.

  • Run targeted tests and the full unittest suite.

Task 3: Docs And Open-Source Readiness

Files:

  • Create: README.md

  • Create: LICENSE

  • Create: CONTRIBUTING.md

  • Create: SECURITY.md

  • Modify: pyproject.toml

  • Modify: selected docs under docs/

  • Add concise repo usage, verification, and experiment integrity guidance.

  • Add MIT license and contribution/security notes.

  • Add project metadata and optional test extra.

  • Update stale docs about high-stop behavior and current test count.

  • Run JSON validation and full unittest suite.

  • Commit changes in logical groups.