Add open source project metadata
Some checks failed
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled

This commit is contained in:
2026-05-06 21:18:21 +08:00
parent c1ff64381d
commit d7df1ebdac
10 changed files with 238 additions and 4 deletions

View File

@@ -60,7 +60,7 @@ The speedup comes from reducing wasted proposal families, not from changing the
- Engine relaunch after early stop is available as opt-in for faster smoke studies, but it is not the default because it can change warm-state comparability.
5. Search-high saturation stop
- If the incumbent's highest measured probe is feasible, has no SLO failures, and is within the configured binary-search resolution of `search.high`, the harness stops before asking the LLM for another proposal.
- If the incumbent's highest measured probe is feasible and is within the configured binary-search resolution of `search.high`, the harness stops before asking the LLM for another proposal. Individual request failures can be present when the aggregate probe still meets the configured pass-rate SLO.
- This is not a model-specific threshold. It means the workload search range, not the engine config, is currently the limiting measurement bound.
6. Deterministic first probes

View File

@@ -118,7 +118,7 @@ A second generic diagnosis bug was fixed: non-SLO bookkeeping counts such as `pr
The base-relative patch issue is now guarded in code, not only in the LLM prompt. When `StudyStore.materialize_trial` sees a runtime/env-only proposal after a non-base incumbent has been found, it inherits the incumbent topology patch into the trial spec unless the proposal explicitly provides a topology. This keeps same-topology runtime validation on the actual incumbent while preserving the ability to test the base topology by stating it explicitly.
Local verification: `PYTHONPATH=src python3 -m unittest discover -s tests` passed 68 tests.
Local verification at that commit: `PYTHONPATH=src python3 -m unittest discover -s tests` passed. The current repository suite has since grown; rerun the command rather than relying on this historical test count.
## Current Harness Judgment

View File

@@ -64,7 +64,7 @@ This run tests a stricter early-stop harness:
- the validation covered topology and runtime families, or accumulated at least three post-incumbent validation attempts.
- If the stop guard fires, `study tune` writes `harness-stop-XXXX` and exits without spending another GPU trial or asking the LLM for another proposal.
- A single-family all-infeasible plateau is not enough to stop deterministically. It only blocks repeating that family; the LLM must either justify a different family or later satisfy the validation/convergence stop rule.
- A search-high saturation guard stops immediately when the incumbent's highest measured probe is feasible, has no SLO failures, and is within the configured binary-search resolution of `search.high`. In that case the current study cannot measure a better config without increasing the workload search range, so more config proposals only waste tuning iterations.
- A search-high saturation guard stops immediately when the incumbent's highest measured probe is feasible and is within the configured binary-search resolution of `search.high`. A feasible probe may still contain individual SLO failures as long as it meets the configured pass-rate target. In that case the current study cannot measure a better config without increasing the workload search range, so more config proposals only waste tuning iterations.
This is a generic harness rule, not a testcase-specific threshold. It does not depend on qwen27b, qwen235b, qwen30b, a fixed TP/DP value, or a hardcoded SLO number.
@@ -76,7 +76,7 @@ Local test command:
PYTHONPATH=src python3 -m unittest tests.test_core_flow -q
```
Result: passed, 77 tests.
Result at the time of this note: passed. The current repository test count may be higher; use the command above as the source of truth.
The added coverage checks:

View File

@@ -0,0 +1,59 @@
# Repo Audit Repair Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Repair the audit findings that affect measurement integrity, state correctness, documentation accuracy, and open-source readiness.
**Architecture:** Keep changes localized to the existing stdlib-only Python package. Measurement validation lives at the HTTP/worker boundary, study state fixes remain in `StudyStore`, compare reporting gains explicit failed/no-feasible accounting, and project metadata/docs are added at repo root.
**Tech Stack:** Python 3.11+ stdlib, `unittest`, setuptools `pyproject.toml`.
---
### Task 1: Measurement Integrity
**Files:**
- Modify: `src/aituner/http_client.py`
- Modify: `src/aituner/slo.py`
- Modify: `src/aituner/worker.py`
- Test: `tests/test_core_flow.py`
- [ ] Write failing tests for completion token source/mismatch failures and persisted per-request probe details.
- [ ] Run the targeted tests and confirm they fail for the expected reason.
- [ ] Add token source metadata to streamed metrics and request outcomes.
- [ ] Fail requests when configured completion length cannot be verified from usage or differs from expected.
- [ ] Persist probe outcome details under each trial artifact directory.
- [ ] Run targeted tests and the full unittest suite.
### Task 2: State, Spec, And Compare Guards
**Files:**
- Modify: `src/aituner/spec.py`
- Modify: `src/aituner/store.py`
- Modify: `src/aituner/compare.py`
- Modify: `scripts/run_multi_compare.py`
- Test: `tests/test_core_flow.py`
- [ ] Write failing tests for state list isolation, invalid trace numeric bounds, and compare aggregate failure accounting.
- [ ] Run targeted tests and confirm expected failures.
- [ ] Deep-copy/replace trial lists when materializing trials.
- [ ] Validate positive trace controls in `TraceSpec.from_dict`.
- [ ] Report failed/no-feasible counts in compare aggregates without changing existing winner semantics.
- [ ] Run targeted tests and the full unittest suite.
### Task 3: Docs And Open-Source Readiness
**Files:**
- Create: `README.md`
- Create: `LICENSE`
- Create: `CONTRIBUTING.md`
- Create: `SECURITY.md`
- Modify: `pyproject.toml`
- Modify: selected docs under `docs/`
- [ ] Add concise repo usage, verification, and experiment integrity guidance.
- [ ] Add MIT license and contribution/security notes.
- [ ] Add project metadata and optional test extra.
- [ ] Update stale docs about high-stop behavior and current test count.
- [ ] Run JSON validation and full unittest suite.
- [ ] Commit changes in logical groups.