The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes, and the 900s request timeout made overloaded probes drain hung requests for 15min each. Cap drain at 180s and bound the search to where the boundaries actually are. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AITuner
AITuner is a small study orchestrator for OpenAI-compatible serving engines. It replays trace windows, searches for the highest feasible offered load under configured SLOs, and records enough trial context for LLM- or harness-guided configuration proposals.
Status
This repository is research tooling. Treat reported experiment numbers as valid
only when the matching study spec, trial artifacts, probe history, and
probe_details.jsonl files are available for audit.
Install
python3 -m pip install -e .
Test
The test suite uses the Python standard library unittest runner:
PYTHONPATH=src python3 -m unittest discover -s tests -v
If the package is installed in editable mode, PYTHONPATH=src is optional.
Basic Workflow
Initialize a study:
aituner study init --spec configs/examples/study.example.json
Run a local tuning loop:
aituner study tune --spec configs/examples/study.example.json --max-trials 2
Run a compare:
aituner compare run --spec configs/examples/compare.example.json
Remote experiment notes for this checkout live in AGENTS.md. The default
remote host is dash0, and code should be synchronized through Git before
remote runs.
Experiment Integrity
- Fixed-length replay requests are scored only when completion token usage is verifiable and matches the trace expectation.
- Each trial writes aggregate probe history and per-request probe details.
request_rate_per_gpuis the primary cross-topology metric:best_feasible_request_rate / (tensor_parallel_size * data_parallel_size).- Compare reports include failed and no-feasible window counts; do not interpret mean request rates without those counts.
- Bounded replays using
max_requests_per_probe,completion_tokens_override, orreplay_time_scaleare convergence tests for that bounded workload, not production benchmarks.
Configuration Notes
Example specs that use llm.endpoint.provider=codex resolve the endpoint from
the local Codex configuration unless llm.endpoint.base_url or
AITUNER_CODEX_BASE_URL is set. Public, reproducible examples should prefer an
explicit endpoint or omit the LLM endpoint and use proposal files.