Clarify qwen27b raw per-iteration performance

This commit is contained in:
2026-05-10 14:24:10 +08:00
parent b0325ecfd9
commit bf7c02e721

View File

@@ -24,6 +24,17 @@ The previous no-harness run was affected by the `dash0` migration and had many e
## Result
The table below is the raw per-iteration performance for a Fig18-style plot. Use this table as `perf[i]`; do not replace missing points with `max(perf[:i+1])`.
Metric: `best_request_rate_per_gpu` from that trial's own `result.json`. `NA` means the proposed config did not produce a feasible point under the SLO. `stop` means the harness stopped before launching another GPU trial.
| Variant | iter1 | iter2 | iter3 | iter4 | iter5 | iter6 | iter7 | iter8 | iter9 | iter10 | iter11 | iter12 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| no-harness raw `perf[i]` | 0.0650 | 0.0617 | 0.0308 | NA | NA | NA | NA | NA | NA | 0.2025 | NA | NA |
| harness raw `perf[i]` | 0.0650 | 0.0617 | 0.2025 | NA | 0.1283 | NA | 0.2696 | 0.2742 | NA | NA | NA | stop |
The raw no-harness curve is not monotonic: iter2 and iter3 are worse than the baseline, and iter4-9 do not produce feasible configs. The monotonic curve below is best-so-far/incumbent tracking, not the measured performance of each proposal.
| Variant | Best iter | Best request rate | Best request rate / GPU | Best config summary |
| --- | ---: | ---: | ---: | --- |
| no-harness rerun | 10 | 0.4050 | 0.2025 | `tensor-parallel-size=2`, `data-parallel-size=1`, `max-num-batched-tokens=12288` |
@@ -33,13 +44,15 @@ Harness reached a higher incumbent and did so earlier. Final best request rate p
## Incumbent Curve
Values are incumbent best request rate per GPU after each tuning iteration.
Values are incumbent best request rate per GPU after each tuning iteration. This table is useful for explaining final best selection, but it should not be used as Fig18 raw `perf[i]`.
| Variant | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| no-harness rerun | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.2025 | 0.2025 | 0.2025 |
| harness | 0.0650 | 0.0650 | 0.2025 | 0.2025 | 0.2025 | 0.2025 | 0.2696 | 0.2742 | 0.2742 | 0.2742 | 0.2742 | stop |
For plotting raw `perf[i]`, keep `NA` points missing or render them as invalid trials. If a plotting script requires numeric values, use `0` only with an explicit label that this means "no feasible configuration under the configured SLO"; do not forward-fill from the incumbent.
## Trial Details
No-harness rerun: