Flag Stop-B e2e per-GPU trajectory as non-benchmark (saturation + smoke regime)
The reported trajectory validates the Stop-B mechanics only. TP2-DP2/TP4 saturated the trace ceiling (best_sampling_u~0.98) so their per-GPU peak is underestimated, and the run used the smoke regime (scale=0.1 + 512 cap). The TP1>TP2 ordering may be real for the small-active MoE but this run cannot establish it; the 27B TP A/B is the valid follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -30,6 +30,25 @@ config, is the bound).
|
||||
|
||||
Incumbent: **trial-0001 (TP1), 2.90 req/s/GPU — never beaten.**
|
||||
|
||||
> **⚠️ The per-GPU trajectory above is NOT a valid benchmark — it validates only
|
||||
> the Stop-B *mechanics*.** Two confounds:
|
||||
> 1. **Trace-ceiling saturation.** TP2·DP2 and TP4 reached `best_sampling_u≈0.98`
|
||||
> (still feasible after consuming ~the whole window), so their *true* peak
|
||||
> per-GPU is higher than the 2.09 shown — we ran out of offered load to push
|
||||
> them to their boundary. Only TP1 (u=0.31), TP2 (u=0.48) and DP2 (u=0.48)
|
||||
> found real boundaries. The `sampling_u` axis maxes at the full trace, so any
|
||||
> config that sustains more than the window's offered rate cannot be measured.
|
||||
> 2. **Smoke regime.** This run inherited `replay_time_scale=0.1` +
|
||||
> `max_requests_per_probe=512` (README: convergence test, *not* a benchmark) —
|
||||
> compressed arrivals distort A and the 512 cap imposes a ~8.4 req/s ceiling.
|
||||
>
|
||||
> The below-ceiling TP1 (2.90) > TP2 (2.21) ordering *may* be real for this model
|
||||
> (Qwen3-30B-A3B is an MoE with ~3B active params → little compute per token → TP
|
||||
> adds all-reduce overhead with little benefit), which differs from the dense
|
||||
> Qwen3.5-27B where TP2 wins. But this run cannot establish it. A valid benchmark
|
||||
> needs `scale=1.0`, no cap, and enough offered-load headroom that strong configs
|
||||
> are not trace-saturated — see the 27B TP A/B follow-up.
|
||||
|
||||
## Phase-5 acceptance
|
||||
|
||||
- **No regression.** The primary metric `request_rate_per_gpu` stayed 2.90 the whole
|
||||
|
||||
Reference in New Issue
Block a user