Measure lower-range performance for infeasible trials

2026-05-10 14:30:34 +08:00
parent bf7c02e721
commit 14259fcec9
4 changed files with 157 additions and 22 deletions
--- a/docs/harness-ablation/qwen235b-thinking-prefill-ttft-20260510.md
+++ b/docs/harness-ablation/qwen235b-thinking-prefill-ttft-20260510.md
@@ -26,7 +26,9 @@ Both runs were launched through `python3 -m aituner.cli study tune`; no proposal

 The table below is the raw per-iteration performance for a Fig18-style plot. Use this table as `perf[i]`; do not replace missing points with `max(perf[:i+1])`.

-Metric: `best_request_rate_per_gpu` from that trial's own `result.json`. `NA` means the proposed config did not produce a feasible point under the SLO, either because the engine/probe failed or because every sampled probe was infeasible.
+Metric: `best_request_rate_per_gpu` from that trial's own `result.json`. `NA` means the proposed config did not produce a feasible point in the measured search range, either because the engine/probe failed or because every sampled probe was infeasible.
+
+Important caveat: these runs were produced before the lower-range fallback fix. For same-parallel-size runtime patches, AITuner inherited the incumbent `sampling_u` as the new search floor. If the config was infeasible above that floor, the old worker wrote `NA` without searching below the floor. Therefore the `NA` entries below are not complete Fig18-quality raw performance points; they are "no feasible point above inherited floor." A rerun with the fixed worker is required to fill their true lower-load performance.

 | Variant | iter1 | iter2 | iter3 | iter4 | iter5 | iter6 | iter7 | iter8 | iter9 | iter10 | iter11 | iter12 |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
--- a/docs/harness-ablation/qwen27b-chat-0-8k-ttft4s-tpot25-20260510.md
+++ b/docs/harness-ablation/qwen27b-chat-0-8k-ttft4s-tpot25-20260510.md
@@ -26,7 +26,9 @@ The previous no-harness run was affected by the `dash0` migration and had many e

 The table below is the raw per-iteration performance for a Fig18-style plot. Use this table as `perf[i]`; do not replace missing points with `max(perf[:i+1])`.

-Metric: `best_request_rate_per_gpu` from that trial's own `result.json`. `NA` means the proposed config did not produce a feasible point under the SLO. `stop` means the harness stopped before launching another GPU trial.
+Metric: `best_request_rate_per_gpu` from that trial's own `result.json`. `NA` means the proposed config did not produce a feasible point in the measured search range. `stop` means the harness stopped before launching another GPU trial.
+
+Important caveat: these runs were produced before the lower-range fallback fix. For same-parallel-size runtime patches, AITuner inherited the incumbent `sampling_u` as the new search floor. If the config was infeasible above that floor, the old worker wrote `NA` without searching below the floor. Therefore the `NA` entries below are not complete Fig18-quality raw performance points; they are "no feasible point above inherited floor." A rerun with the fixed worker is required to fill their true lower-load performance.

 | Variant | iter1 | iter2 | iter3 | iter4 | iter5 | iter6 | iter7 | iter8 | iter9 | iter10 | iter11 | iter12 |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |