Go to file

Gahow Wang b3156a382a Harness: gate gpu-mem-util/seqs-raise on 'no untested TP increase' (frontier-closed)

The first gpt-5.5 verification run exposed a bug in the prior gate: topology_settled =
cur_tp>base_tp let gpu-memory-utilization fire on a TP2 incumbent (TP2>baseline TP1)
and preempt the still-open TP4 frontier -- the harness proposed TP2+gpu-mem-util=0.92
at iter 2 instead of climbing to TP4. The candidate path runs before the topology-
frontier check, so a score>=0.35 runtime candidate wins.

Fix: gate runtime micro-tuning (gpu-mem-util, raising max-num-seqs) on the TP frontier
being closed -- topology_settled = no untested _next_allowed_tp remains (respects GPU
count, so TP4 is the real ceiling on 6 GPUs). New regression test: TP2 incumbent with
TP4 reachable must climb TP and must NOT propose gpu-mem-util. 116 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-19 13:33:29 +08:00

configs/examples

Ablation: pin gpt-5.5 @ ai.gahow.org (chat.completions); re-read token per arm

2026-06-19 11:27:47 +08:00

docs

Complete harness-vs-naive ablation: harness 3x faster + stops; naive nondeterministic

2026-06-17 13:03:26 +08:00

infra/gpu_fleet

Initial AITuner study orchestrator

2026-04-04 21:26:37 +08:00

scripts

Add harness-only dash1 driver to verify the gpu-mem-util fix recovers ~0.87 + stops

2026-06-19 11:29:32 +08:00

src/aituner

Harness: gate gpu-mem-util/seqs-raise on 'no untested TP increase' (frontier-closed)

2026-06-19 13:33:29 +08:00

tests

Harness: gate gpu-mem-util/seqs-raise on 'no untested TP increase' (frontier-closed)

2026-06-19 13:33:29 +08:00

.env.example

Add codex and bailian LLM provider presets

2026-04-07 11:31:26 +08:00

.gitignore

compare: add multi-candidate runner

2026-04-13 20:50:39 +08:00

AGENTS.md

Document dash0 experiment workflow

2026-04-25 16:18:28 +08:00

CONTRIBUTING.md

Add open source project metadata

2026-05-06 21:18:21 +08:00

LICENSE

Add open source project metadata

2026-05-06 21:18:21 +08:00

paper.pdf

Add reference paper and qwen27b tpot25 16-iter notes

2026-06-15 14:02:30 +08:00

pyproject.toml

Add open source project metadata

2026-05-06 21:18:21 +08:00

README.md

Add open source project metadata

2026-05-06 21:18:21 +08:00

SECURITY.md

Add open source project metadata

2026-05-06 21:18:21 +08:00

README.md

AITuner

AITuner is a small study orchestrator for OpenAI-compatible serving engines. It replays trace windows, searches for the highest feasible offered load under configured SLOs, and records enough trial context for LLM- or harness-guided configuration proposals.

Status

This repository is research tooling. Treat reported experiment numbers as valid only when the matching study spec, trial artifacts, probe history, and probe_details.jsonl files are available for audit.

Install

python3 -m pip install -e .

Test

The test suite uses the Python standard library unittest runner:

PYTHONPATH=src python3 -m unittest discover -s tests -v

If the package is installed in editable mode, PYTHONPATH=src is optional.

Basic Workflow

Initialize a study:

aituner study init --spec configs/examples/study.example.json

Run a local tuning loop:

aituner study tune --spec configs/examples/study.example.json --max-trials 2

Run a compare:

aituner compare run --spec configs/examples/compare.example.json

Remote experiment notes for this checkout live in AGENTS.md. The default remote host is dash0, and code should be synchronized through Git before remote runs.

Experiment Integrity

Fixed-length replay requests are scored only when completion token usage is verifiable and matches the trace expectation.
Each trial writes aggregate probe history and per-request probe details.
request_rate_per_gpu is the primary cross-topology metric: best_feasible_request_rate / (tensor_parallel_size * data_parallel_size).
Compare reports include failed and no-feasible window counts; do not interpret mean request rates without those counts.
Bounded replays using max_requests_per_probe, completion_tokens_override, or replay_time_scale are convergence tests for that bounded workload, not production benchmarks.

Configuration Notes

Example specs that use llm.endpoint.provider=codex resolve the endpoint from the local Codex configuration unless llm.endpoint.base_url or AITUNER_CODEX_BASE_URL is set. Public, reproducible examples should prefer an explicit endpoint or omit the LLM endpoint and use proposal files.