trace: include weekend legacy windows

docs: add qwen27b chat 0-8k compare summary
2026-04-12 22:43:02 +08:00 · 2026-04-12 22:39:57 +08:00
3 changed files with 85 additions and 1 deletions
--- a/docs/qwen27b-chat-0-8k-7day-compare/README.md
+++ b/docs/qwen27b-chat-0-8k-7day-compare/README.md
@@ -0,0 +1,72 @@
+# qwen27b-chat-0-8k-7day-compare
+
+qwen3.5-27b `chat` trace, `0~8k` input bucket, tuned-best vs baseline cross-day compare on internal vLLM (`/usr/local/bin/vllm`), compared by `request_rate_per_gpu`.
+
+## Setup
+
+- Hardware: `dash1`, `8x H20`
+- Model: `/home/admin/resource/model/464482ce/qwen3.5-27b/256k-0223-internal`
+- Engine: internal vLLM
+- Baseline: empty patch over the study spec baseline, aligned to `~/run_qwen27b.sh` `TP=1, DP=1`
+- Tuned best source: `trial-0004` from `dash0-qwen27b-tight-slo-10min-run9-chat-0-8k-codex-topology`
+- Tuned best config:
+  - `tensor-parallel-size=2`
+  - `data-parallel-size=1`
+- Trace family: `chat`
+- Input bucket: `0 <= input_length <= 8192`
+- Time range scanned: `2026-03-11` to `2026-03-17`
+- Available windows in this slot: `5`
+  - `chat_w20260311_1000`
+  - `chat_w20260312_1000`
+  - `chat_w20260313_1000`
+  - `chat_w20260316_1000`
+  - `chat_w20260317_1000`
+- Window duration: `600s` (`10:00-10:10`)
+- Request mode: `chat`
+- SLO:
+  - pass target: `95%`
+  - `TTFT <= 2000ms` for `<=4096` input tokens
+  - `TTFT <= 4000ms` for `<=32768` input tokens
+  - `TTFT <= 6000ms` for `>32768` input tokens
+  - `TPOT <= 50ms`
+- Search:
+  - binary search on `sampling_u`
+  - `max_probes = 6`
+
+## Run assets
+
+- Compare root: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-compare/dash1-qwen27b-chat-0-8k-7days-compare`
+- Summary: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-compare/dash1-qwen27b-chat-0-8k-7days-compare/summary.json`
+- Report: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-compare/dash1-qwen27b-chat-0-8k-7days-compare/report.md`
+- Compare spec: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-compare/specs/qwen27b_chat_0_8k_compare_dash1.json`
+- Tuned study root: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-tight/dash0-qwen27b-tight-slo-10min-run9-chat-0-8k-codex-topology`
+
+## Aggregate result
+
+- Comparable wins: tuned `3`, baseline `0`
+- Incomparable windows: `2`
+- Baseline mean request rate: `0.02888888888888889 req/s`
+- Tuned mean request rate: `0.47700000000000004 req/s`
+- Baseline mean request rate per GPU: `0.02888888888888889 req/s/gpu`
+- Tuned mean request rate per GPU: `0.23850000000000002 req/s/gpu`
+
+## Per-window result
+
+| Window | Date | Baseline req/s/gpu | Tuned req/s/gpu | Winner |
+| --- | --- | ---: | ---: | --- |
+| `chat_w20260311_1000` | `2026-03-11` | `0.035` | `0.21416666666666667` | `tuned` |
+| `chat_w20260312_1000` | `2026-03-12` | `None` | `0.28` | `incomparable` |
+| `chat_w20260313_1000` | `2026-03-13` | `0.03166666666666667` | `0.265` | `tuned` |
+| `chat_w20260316_1000` | `2026-03-16` | `0.02` | `0.23833333333333334` | `tuned` |
+| `chat_w20260317_1000` | `2026-03-17` | `None` | `0.195` | `incomparable` |
+
+## Key insights
+
+- This compare does not support the conclusion that the tuned config lacks generalization. On the available days, tuned wins every directly comparable window.
+- The two `incomparable` days are not execution failures. Baseline completed probing but never found a single feasible `sampling_u` under the target SLO, while tuned still found feasible operating points.
+- The tuned `TP=2, DP=1` shape is materially more robust than the `TP=1, DP=1` baseline for this `0~8k` chat bucket.
+- The throughput gap is large even after normalizing by GPU count, so this is not just a raw-card-count artifact.
+
+## Recommendation
+
+For `qwen27b chat 0~8k`, keep using the tuned `TP=2, DP=1` serving shape as the default candidate over the `TP=1, DP=1` baseline, and treat cross-day robustness as confirmed on the currently available windows.
--- a/scripts/prepare_trace_windows.py
+++ b/scripts/prepare_trace_windows.py
@@ -16,7 +16,15 @@ DEFAULT_THINKING_SOURCE = Path(
    "/home/admin/cpfs/wjh/bailian-trace/qwen-trace-260321-260327-formatted"
 )
 DEFAULT_OUTPUT_ROOT = REPO_ROOT / "trace_windows"
-LEGACY_TARGET_DATES = ["2026-03-11", "2026-03-12", "2026-03-13", "2026-03-16", "2026-03-17"]
+LEGACY_TARGET_DATES = [
+    "2026-03-11",
+    "2026-03-12",
+    "2026-03-13",
+    "2026-03-14",
+    "2026-03-15",
+    "2026-03-16",
+    "2026-03-17",
+]
 THINKING_WINDOWS = [
    ("2026-03-21", "1000"),
    ("2026-03-22", "1000"),
--- a/tests/test_core_flow.py
+++ b/tests/test_core_flow.py
@@ -769,6 +769,10 @@ class CoreFlowTests(unittest.TestCase):
                "qwen_chat_blksz_64_031221-031223",
                "qwen_chat_blksz_64_031309-031311",
                "qwen_chat_blksz_64_031321-031323",
+                "qwen_chat_blksz_64_031409-031411",
+                "qwen_chat_blksz_64_031421-031423",
+                "qwen_chat_blksz_64_031509-031511",
+                "qwen_chat_blksz_64_031521-031523",
                "qwen_chat_blksz_64_031609-031611",
                "qwen_chat_blksz_64_031621-031623",
                "qwen_chat_blksz_64_031709-031711",
Author	SHA1	Message	Date
Gahow Wang	631a076498	trace: include weekend legacy windows	2026-04-12 22:43:02 +08:00
Gahow Wang	ade81b5549	docs: add qwen27b chat 0-8k compare summary	2026-04-12 22:39:57 +08:00