Add qwen27b and qwen235b tuning notes
This commit is contained in:
@@ -2,6 +2,26 @@
|
||||
|
||||
qwen3-235b-a22b `thinking` trace, `decode_only` mode, internal vLLM (`/usr/local/bin/vllm`), SLO: `p95-equivalent pass target 95%`, `TPOT <= 40ms`, `TTFT` not enforced.
|
||||
|
||||
## Setup
|
||||
|
||||
- Hardware: `dash0`, `8x H20`
|
||||
- Model: `/home/admin/resource/model/464482ce.qwen3-235b-a22b/256k-0717`
|
||||
- Engine: internal vLLM, decode-only mode with `--kv-transfer-config {"kv_connector":"DecodeBenchConnector","kv_role":"kv_both"}`
|
||||
- Baseline topology: `TP=4, DP=2, EP=8`
|
||||
- Trace: `thinking_w20260327_1000`
|
||||
- Trace source: `trace_windows/traces/thinking_w20260327_1000.jsonl`
|
||||
- Window duration: `600s` (`10:00-10:10`, `2026-03-27`)
|
||||
- Request mode: `decode_only`
|
||||
- SLO:
|
||||
- pass target: `95%`
|
||||
- `TPOT <= 40ms`
|
||||
- `TTFT` not enforced
|
||||
- Search:
|
||||
- `sampling_u in [0, 0.125]`
|
||||
- `max_probes = 6`
|
||||
- `12` trials total
|
||||
- Proposal model: `codex / gpt-5.4`
|
||||
|
||||
## Run assets
|
||||
|
||||
- Study root: `/home/admin/cpfs/wjh/aituner/aituner/.aituner-decode/dash0-qwen235b-decode-thinking-run5-tpot40-topology`
|
||||
|
||||
Reference in New Issue
Block a user