Flag Stop-B e2e per-GPU trajectory as non-benchmark (saturation + smoke regime)

The reported trajectory validates the Stop-B mechanics only. TP2-DP2/TP4 saturated
the trace ceiling (best_sampling_u~0.98) so their per-GPU peak is underestimated, and
the run used the smoke regime (scale=0.1 + 512 cap). The TP1>TP2 ordering may be real
for the small-active MoE but this run cannot establish it; the 27B TP A/B is the valid
follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 18:40:38 +08:00
parent 4f45b546a1
commit 77af4ded2a

View File

@@ -30,6 +30,25 @@ config, is the bound).
Incumbent: **trial-0001 (TP1), 2.90 req/s/GPU — never beaten.**
> **⚠️ The per-GPU trajectory above is NOT a valid benchmark — it validates only
> the Stop-B *mechanics*.** Two confounds:
> 1. **Trace-ceiling saturation.** TP2·DP2 and TP4 reached `best_sampling_u≈0.98`
> (still feasible after consuming ~the whole window), so their *true* peak
> per-GPU is higher than the 2.09 shown — we ran out of offered load to push
> them to their boundary. Only TP1 (u=0.31), TP2 (u=0.48) and DP2 (u=0.48)
> found real boundaries. The `sampling_u` axis maxes at the full trace, so any
> config that sustains more than the window's offered rate cannot be measured.
> 2. **Smoke regime.** This run inherited `replay_time_scale=0.1` +
> `max_requests_per_probe=512` (README: convergence test, *not* a benchmark) —
> compressed arrivals distort A and the 512 cap imposes a ~8.4 req/s ceiling.
>
> The below-ceiling TP1 (2.90) > TP2 (2.21) ordering *may* be real for this model
> (Qwen3-30B-A3B is an MoE with ~3B active params → little compute per token → TP
> adds all-reduce overhead with little benefit), which differs from the dense
> Qwen3.5-27B where TP2 wins. But this run cannot establish it. A valid benchmark
> needs `scale=1.0`, no cap, and enough offered-load headroom that strong configs
> are not trace-saturated — see the 27B TP A/B follow-up.
## Phase-5 acceptance
- **No regression.** The primary metric `request_rate_per_gpu` stayed 2.90 the whole