docs: update qwen27b 7-day compare
This commit is contained in:
@@ -15,10 +15,12 @@ qwen3.5-27b `chat` trace, `0~8k` input bucket, tuned-best vs baseline cross-day
|
||||
- Trace family: `chat`
|
||||
- Input bucket: `0 <= input_length <= 8192`
|
||||
- Time range scanned: `2026-03-11` to `2026-03-17`
|
||||
- Available windows in this slot: `5`
|
||||
- Available windows in this slot: `7`
|
||||
- `chat_w20260311_1000`
|
||||
- `chat_w20260312_1000`
|
||||
- `chat_w20260313_1000`
|
||||
- `chat_w20260314_1000`
|
||||
- `chat_w20260315_1000`
|
||||
- `chat_w20260316_1000`
|
||||
- `chat_w20260317_1000`
|
||||
- Window duration: `600s` (`10:00-10:10`)
|
||||
@@ -43,12 +45,12 @@ qwen3.5-27b `chat` trace, `0~8k` input bucket, tuned-best vs baseline cross-day
|
||||
|
||||
## Aggregate result
|
||||
|
||||
- Comparable wins: tuned `3`, baseline `0`
|
||||
- Comparable wins: tuned `5`, baseline `0`
|
||||
- Incomparable windows: `2`
|
||||
- Baseline mean request rate: `0.02888888888888889 req/s`
|
||||
- Tuned mean request rate: `0.47700000000000004 req/s`
|
||||
- Baseline mean request rate per GPU: `0.02888888888888889 req/s/gpu`
|
||||
- Tuned mean request rate per GPU: `0.23850000000000002 req/s/gpu`
|
||||
- Baseline mean request rate: `0.046 req/s`
|
||||
- Tuned mean request rate: `0.4723809523809524 req/s`
|
||||
- Baseline mean request rate per GPU: `0.046 req/s/gpu`
|
||||
- Tuned mean request rate per GPU: `0.2361904761904762 req/s/gpu`
|
||||
|
||||
## Per-window result
|
||||
|
||||
@@ -57,16 +59,19 @@ qwen3.5-27b `chat` trace, `0~8k` input bucket, tuned-best vs baseline cross-day
|
||||
| `chat_w20260311_1000` | `2026-03-11` | `0.035` | `0.21416666666666667` | `tuned` |
|
||||
| `chat_w20260312_1000` | `2026-03-12` | `None` | `0.28` | `incomparable` |
|
||||
| `chat_w20260313_1000` | `2026-03-13` | `0.03166666666666667` | `0.265` | `tuned` |
|
||||
| `chat_w20260316_1000` | `2026-03-16` | `0.02` | `0.23833333333333334` | `tuned` |
|
||||
| `chat_w20260314_1000` | `2026-03-14` | `0.021666666666666667` | `0.24083333333333334` | `tuned` |
|
||||
| `chat_w20260315_1000` | `2026-03-15` | `0.12166666666666667` | `0.23083333333333333` | `tuned` |
|
||||
| `chat_w20260316_1000` | `2026-03-16` | `0.02` | `0.2275` | `tuned` |
|
||||
| `chat_w20260317_1000` | `2026-03-17` | `None` | `0.195` | `incomparable` |
|
||||
|
||||
## Key insights
|
||||
|
||||
- This compare does not support the conclusion that the tuned config lacks generalization. On the available days, tuned wins every directly comparable window.
|
||||
- This compare does not support the conclusion that the tuned config lacks generalization. Across the full 7-day slice, tuned wins every directly comparable window.
|
||||
- The two `incomparable` days are not execution failures. Baseline completed probing but never found a single feasible `sampling_u` under the target SLO, while tuned still found feasible operating points.
|
||||
- The tuned `TP=2, DP=1` shape is materially more robust than the `TP=1, DP=1` baseline for this `0~8k` chat bucket.
|
||||
- The throughput gap is large even after normalizing by GPU count, so this is not just a raw-card-count artifact.
|
||||
- The weekend windows do not break the result. `2026-03-14` is another clear tuned win, and even on `2026-03-15`, where baseline is relatively stronger than other days, tuned still wins by about `1.90x` on `req/s/gpu`.
|
||||
- The throughput gap remains large even after normalizing by GPU count, so this is not just a raw-card-count artifact.
|
||||
|
||||
## Recommendation
|
||||
|
||||
For `qwen27b chat 0~8k`, keep using the tuned `TP=2, DP=1` serving shape as the default candidate over the `TP=1, DP=1` baseline, and treat cross-day robustness as confirmed on the currently available windows.
|
||||
For `qwen27b chat 0~8k`, keep using the tuned `TP=2, DP=1` serving shape as the default candidate over the `TP=1, DP=1` baseline, and treat cross-day robustness as confirmed on the full 7-day window set.
|
||||
|
||||
Reference in New Issue
Block a user