Clarify qwen27b raw per-iteration performance
This commit is contained in:
@@ -24,6 +24,17 @@ The previous no-harness run was affected by the `dash0` migration and had many e
|
|||||||
|
|
||||||
## Result
|
## Result
|
||||||
|
|
||||||
|
The table below is the raw per-iteration performance for a Fig18-style plot. Use this table as `perf[i]`; do not replace missing points with `max(perf[:i+1])`.
|
||||||
|
|
||||||
|
Metric: `best_request_rate_per_gpu` from that trial's own `result.json`. `NA` means the proposed config did not produce a feasible point under the SLO. `stop` means the harness stopped before launching another GPU trial.
|
||||||
|
|
||||||
|
| Variant | iter1 | iter2 | iter3 | iter4 | iter5 | iter6 | iter7 | iter8 | iter9 | iter10 | iter11 | iter12 |
|
||||||
|
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||||
|
| no-harness raw `perf[i]` | 0.0650 | 0.0617 | 0.0308 | NA | NA | NA | NA | NA | NA | 0.2025 | NA | NA |
|
||||||
|
| harness raw `perf[i]` | 0.0650 | 0.0617 | 0.2025 | NA | 0.1283 | NA | 0.2696 | 0.2742 | NA | NA | NA | stop |
|
||||||
|
|
||||||
|
The raw no-harness curve is not monotonic: iter2 and iter3 are worse than the baseline, and iter4-9 do not produce feasible configs. The monotonic curve below is best-so-far/incumbent tracking, not the measured performance of each proposal.
|
||||||
|
|
||||||
| Variant | Best iter | Best request rate | Best request rate / GPU | Best config summary |
|
| Variant | Best iter | Best request rate | Best request rate / GPU | Best config summary |
|
||||||
| --- | ---: | ---: | ---: | --- |
|
| --- | ---: | ---: | ---: | --- |
|
||||||
| no-harness rerun | 10 | 0.4050 | 0.2025 | `tensor-parallel-size=2`, `data-parallel-size=1`, `max-num-batched-tokens=12288` |
|
| no-harness rerun | 10 | 0.4050 | 0.2025 | `tensor-parallel-size=2`, `data-parallel-size=1`, `max-num-batched-tokens=12288` |
|
||||||
@@ -33,13 +44,15 @@ Harness reached a higher incumbent and did so earlier. Final best request rate p
|
|||||||
|
|
||||||
## Incumbent Curve
|
## Incumbent Curve
|
||||||
|
|
||||||
Values are incumbent best request rate per GPU after each tuning iteration.
|
Values are incumbent best request rate per GPU after each tuning iteration. This table is useful for explaining final best selection, but it should not be used as Fig18 raw `perf[i]`.
|
||||||
|
|
||||||
| Variant | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|
| Variant | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|
||||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||||
| no-harness rerun | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.2025 | 0.2025 | 0.2025 |
|
| no-harness rerun | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.0650 | 0.2025 | 0.2025 | 0.2025 |
|
||||||
| harness | 0.0650 | 0.0650 | 0.2025 | 0.2025 | 0.2025 | 0.2025 | 0.2696 | 0.2742 | 0.2742 | 0.2742 | 0.2742 | stop |
|
| harness | 0.0650 | 0.0650 | 0.2025 | 0.2025 | 0.2025 | 0.2025 | 0.2696 | 0.2742 | 0.2742 | 0.2742 | 0.2742 | stop |
|
||||||
|
|
||||||
|
For plotting raw `perf[i]`, keep `NA` points missing or render them as invalid trials. If a plotting script requires numeric values, use `0` only with an explicit label that this means "no feasible configuration under the configured SLO"; do not forward-fill from the incumbent.
|
||||||
|
|
||||||
## Trial Details
|
## Trial Details
|
||||||
|
|
||||||
No-harness rerun:
|
No-harness rerun:
|
||||||
|
|||||||
Reference in New Issue
Block a user