Document TPOT40 baseline infeasible run
This commit is contained in:
121
docs/qwen27b-chat-0-8k-tpot40-baseline-infeasible-20260507.md
Normal file
121
docs/qwen27b-chat-0-8k-tpot40-baseline-infeasible-20260507.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
# Qwen27B Chat 0-8k TPOT 40ms Baseline Infeasible Run
|
||||||
|
|
||||||
|
Date: 2026-05-07
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Re-run the internal vLLM + Qwen3.5-27B chat 0-8k tuning comparison after adding a study-level guard:
|
||||||
|
|
||||||
|
- if the automatic baseline trial has no feasible probe;
|
||||||
|
- and the lowest sampled request rate still fails the SLO target pass rate;
|
||||||
|
- then AITuner stops the whole study and reports that the SLO is too tight for the current setup.
|
||||||
|
|
||||||
|
This prevents spending the remaining tuning budget on LLM or harness proposals when the baseline itself demonstrates that the workload/SLO is infeasible at the search floor.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
Commit: `f212673 Stop tuning when baseline is infeasible`
|
||||||
|
|
||||||
|
Changed behavior:
|
||||||
|
|
||||||
|
- `study tune` now persists `tuning_stop_reason` and `tuning_stop_diagnosis` in `state.json`.
|
||||||
|
- After the automatic baseline trial is ingested, AITuner checks the worker result:
|
||||||
|
- `status == completed`
|
||||||
|
- `best_request_rate is None`
|
||||||
|
- at least one probe exists
|
||||||
|
- all probes are infeasible
|
||||||
|
- If true, AITuner stops before asking the LLM or harness for any proposal.
|
||||||
|
- Re-running the same study respects the persisted stop state and does not resume tuning.
|
||||||
|
|
||||||
|
Validation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m compileall -q src tests
|
||||||
|
PYTHONPATH=src python3 -m unittest tests.test_core_flow
|
||||||
|
```
|
||||||
|
|
||||||
|
Local and `dash0` both passed.
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
Host: `dash0`
|
||||||
|
|
||||||
|
Remote repo: `/home/admin/cpfs/wjh/aituner/aituner`
|
||||||
|
|
||||||
|
Base spec: `configs/examples/dash0_qwen27b_tight_slo_run4_0_8k.json`
|
||||||
|
|
||||||
|
Model: `/home/admin/resource/model/464482ce/qwen3.5-27b/256k-0223-internal`
|
||||||
|
|
||||||
|
Workload: chat, 0-8k input window
|
||||||
|
|
||||||
|
SLO:
|
||||||
|
|
||||||
|
- TTFT: existing step rule from the base spec
|
||||||
|
- TPOT: fixed `40ms`
|
||||||
|
- target pass rate: `0.95`
|
||||||
|
|
||||||
|
Search:
|
||||||
|
|
||||||
|
- Direct AITuner command: `python3 -m aituner.cli study tune ... --max-trials 12`
|
||||||
|
- No manual proposal/state edits during either run.
|
||||||
|
- Both variants used `CUDA_VISIBLE_DEVICES=0,1,2,4,5,6,7`; this was identical for both specs.
|
||||||
|
- The two specs were verified equal after normalizing only `study_id` and `llm.use_harness`.
|
||||||
|
|
||||||
|
Specs:
|
||||||
|
|
||||||
|
- no-harness: `.aituner-tight/specs/dash0-qwen27b-chat-0-8k-tpot40-gpu3skip-12iter-noharness-20260507.json`
|
||||||
|
- harness: `.aituner-tight/specs/dash0-qwen27b-chat-0-8k-tpot40-gpu3skip-12iter-harness-20260507.json`
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
No harness:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src python3 -m aituner.cli study tune \
|
||||||
|
--spec .aituner-tight/specs/dash0-qwen27b-chat-0-8k-tpot40-gpu3skip-12iter-noharness-20260507.json \
|
||||||
|
--store-root .aituner-tight \
|
||||||
|
--max-trials 12
|
||||||
|
```
|
||||||
|
|
||||||
|
Harness:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src python3 -m aituner.cli study tune \
|
||||||
|
--spec .aituner-tight/specs/dash0-qwen27b-chat-0-8k-tpot40-gpu3skip-12iter-harness-20260507.json \
|
||||||
|
--store-root .aituner-tight \
|
||||||
|
--max-trials 12
|
||||||
|
```
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
Both runs stopped after the baseline trial. No LLM/harness proposal was evaluated because baseline had no feasible probe.
|
||||||
|
|
||||||
|
| Variant | Trials executed | Best request rate | Best request rate / GPU | Stop reason |
|
||||||
|
| --- | ---: | ---: | ---: | --- |
|
||||||
|
| no-harness | 1 | - | - | `baseline_all_infeasible` |
|
||||||
|
| harness | 1 | - | - | `baseline_all_infeasible` |
|
||||||
|
|
||||||
|
Baseline probe curve:
|
||||||
|
|
||||||
|
| sampling_u | request rate | pass rate | feasible | early stop reason |
|
||||||
|
| ---: | ---: | ---: | --- | --- |
|
||||||
|
| 0.03125 | 0.895 | 0.000000 | false | `slo_pass_rate_unrecoverable` |
|
||||||
|
| 0.015625 | 0.483333 | 0.137931 | false | `slo_pass_rate_unrecoverable` |
|
||||||
|
| 0.0078125 | 0.246667 | 0.236486 | false | `slo_pass_rate_unrecoverable` |
|
||||||
|
| 0.00390625 | 0.123333 | 0.189189 | false | `slo_pass_rate_unrecoverable` |
|
||||||
|
| 0.001953125 | 0.065000 | 0.205128 | false | `slo_pass_rate_unrecoverable` |
|
||||||
|
| 0.0009765625 | 0.035000 | 0.142857 | false | `slo_pass_rate_unrecoverable` |
|
||||||
|
|
||||||
|
Final diagnosis written by AITuner:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Baseline configuration has no feasible probe under the current SLO. Stopping tuning because even the lowest sampled request rate did not meet the target pass rate. lowest_sampled_request_rate=0.035 lowest_sampling_u=0.000976562 lowest_probe_pass_rate=0.142857 early_stop_reason=slo_pass_rate_unrecoverable
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interpretation
|
||||||
|
|
||||||
|
This run does not measure harness acceleration. It proves that the TPOT 40ms setup is infeasible for the current baseline and search floor: even at `0.035` aggregate request rate, only `14.29%` of requests pass the SLO, far below the `95%` target.
|
||||||
|
|
||||||
|
The correct behavior is to stop the study early and report SLO infeasibility instead of spending the remaining 11 trial slots. Harness cannot accelerate convergence when there is no feasible baseline point and no incumbent for guided tuning.
|
||||||
|
|
||||||
|
For a Fig. 18-style convergence comparison, the next setup must first have at least one feasible baseline or feasible low-rate point under the same metric definitions.
|
||||||
Reference in New Issue
Block a user