Add qwen27b and qwen235b tuning notes

2026-04-11 12:07:42 +08:00
parent 31dd44c54b
commit a0b2d7eab2
2 changed files with 103 additions and 0 deletions
--- a/docs/qwen235b-thinking-decode/README.md
+++ b/docs/qwen235b-thinking-decode/README.md
@@ -2,6 +2,26 @@

 qwen3-235b-a22b `thinking` trace, `decode_only` mode, internal vLLM (`/usr/local/bin/vllm`), SLO: `p95-equivalent pass target 95%`, `TPOT <= 40ms`, `TTFT` not enforced.

+## Setup
+
+- Hardware: `dash0`, `8x H20`
+- Model: `/home/admin/resource/model/464482ce.qwen3-235b-a22b/256k-0717`
+- Engine: internal vLLM, decode-only mode with `--kv-transfer-config {"kv_connector":"DecodeBenchConnector","kv_role":"kv_both"}`
+- Baseline topology: `TP=4, DP=2, EP=8`
+- Trace: `thinking_w20260327_1000`
+- Trace source: `trace_windows/traces/thinking_w20260327_1000.jsonl`
+- Window duration: `600s` (`10:00-10:10`, `2026-03-27`)
+- Request mode: `decode_only`
+- SLO:
+  - pass target: `95%`
+  - `TPOT <= 40ms`
+  - `TTFT` not enforced
+- Search:
+  - `sampling_u in [0, 0.125]`
+  - `max_probes = 6`
+  - `12` trials total
+- Proposal model: `codex / gpt-5.4`
+
 ## Run assets

 - Study root: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-decode/dash0-qwen235b-decode-thinking-run5-tpot40-topology`