Files
obsidian/projects/auto-tuner/ali trace.md

415 lines
17 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## TODO
done:
- codex-chat
- codex-chat-5090: codex resume 019d4945-4991-7331-a848-1be6fd702e9f
- codex-coder
- scoot-chat
- scoot-thinking-prefill
- scoot-thinking-decode
dash1: codex-thinking-decode
dash2: codex-thinking-prefill
dash3: scoot-coder
dash5: scoot-chat-5090
```bash
# chat
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/custom_trace_windows/qwen_coder_next_internal_chat_day0_t0p002_fixedcount/sampled_traces/chat_w20260311_peak_1000.jsonl
# prefill-only
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/custom_trace_windows/qwen_coder_next_internal_thinking_day0_t0p04_fixedcount/sampled_traces/thinking_w20260323_peak_1000.jsonl
# coder
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/custom_trace_windows/qwen_coder_next_internal_coder_peak_7day_fixedcount/sampled_traces/coder_w20260311_peak_1000.jsonl
# decode-only
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/plans/dash0123_8gpu__qwen235b__internal__decode_only__thinking__legal11_thinking_decode_only_weekly0321_0327_peak_local8/traces/thinking_w20260321_peak_1000.jsonl
```
所以我会在代码里直接把 internal profile 的“必须 chunked prefill”为硬约束而不是继续靠超时失败去学
Fig 7/8 加上和 real trace 相同的 semi-real
ongoing:
dash0: qwen235b decode-only 测试
dash1/2: qwen235b thinking 30min 测试
dash3: qwen-coder-next coder 30min 测试
dash5: 5090 qwen27b chat-0-32k 测试
4.1
证明不同 workload 不能用来 tune 不同 cluster 的数据
✅4.2
✅synthetic/semi-real/real 的性能对比数据:
83.91, 98.19, 98.4
65.22, 86,03,98.28
✅synthetic/semi-real/real 的相似度对比数据
chat/thinking/coder 的 prefix 下 tuned best 的对比数据
chat/thinking/coder 的 prefix 下的相似度对比数据
4.3
agent harness 总结
5
tuner vs baseline
✅default config
```bash
# Qwen3.5-27B
# https://huggingface.co/Qwen/Qwen3.5-27B
vllm serve Qwen/Qwen3.5-27B \
--tensor-parallel-size 8 \
--max-model-len 262144 \
--reasoning-parser qwen3
# Qwen3-Coder-Next
# https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-Next.html#basic-multi-gpu-setup
vllm serve Qwen/Qwen3-Coder-Next-FP8 \
--tensor-parallel-size 4 \
--enable-prefix-caching
# Qwen3-235B-A22B-FP8
# https://huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-FP8
vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
--tensor-parallel-size 4 \
--max-model-len 262144
# https://github.com/aliez-ren/vllm-qwen3.5-nvfp4-sm120?utm_source=chatgpt.com
vllm serve Kbenkhaled/Qwen3.5-27B-NVFP4 \
--max-model-len 234567 \
--gpu-memory-utilization 0.89 \
--max-num-seqs 4 \
--max-num-batched-tokens 4096
vllm serve Qwen/Qwen3.5-27B-FP8 \
--quantization fp8 \
--dtype auto \
--gpu-memory-utilization 0.85 \
--max-model-len 131072 \
--max-num-seqs 2 \
--max-num-batched-tokens 2048 \
--tensor-parallel-size 1
# https://huggingface.co/Qwen/Qwen3.5-27B-FP8?utm_source=chatgpt.com
vllm serve Qwen/Qwen3.5-27B-FP8 --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3
```
```bash
# 跑 qwen27b batching
# running on dash2
bash /home/admin/cpfs/wjh/aituner/tuner-workload-principle/launch_qwen35_27b_tp2dp1_epoff_batching_chat0_32k_weekly_peak.sh
# 跑 evaluator 的对比
# running on dash1
CASE_KIND=chat TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
CASE_KIND=coder TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
CASE_KIND=thinking_prefill TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
CASE_KIND=thinking_decode TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
# [x] 对 qwen35_27b对齐线上 tracesearch 04k threshold 后再测试
./workflow threshold-search \
--hardware dash0123_8gpu \
--model qwen35_27b \
--engine internal \
--workload chat \
--phase prefill_decode \
--trace-type chat-0-4k \
--max-threshold 0.5
# qwen3-coder-next 跑不了 EP?
# [x] dash3 上的要重新放到 dash2 跑
cd /home/admin/cpfs/wjh/aituner/tuner-workload-principle
bash workflow_output/plans/dash0123_8gpu__qwen35_27b__internal__prefill_decode__chat__legal10_chat0_4k/run_results_v2_trace_dash3.sh --machine-label dash2
# [x] 跑 qwen35_27b 0~32k
./launch_qwen35_27b_chat_0_32k_after_trace_prepare.sh
```
## 可运行情况
qwen-235b ✅
qwen27b ✅
qwen-coder需要切换到对应的 vllm
qwen-30b需要切换到对应的 container 支持 flash-infer 的版本
```
pip install -U flashinfer-python
flashinfer >= 0.7
wjh@ds-f74814b6-1-65cd484875-256zt:~$ pip list | grep flashinfer
flashinfer-cubin 0.6.4
flashinfer-jit-cache 0.6.4
flashinfer-python 0.6.4
```
## 线上性能
qwen27b:
40 instance: Mean: 4.00 qps Max: 5.67 qps
prefill: Mean: 19.3 万tpm Max: 33.9 万tpm
decode: Mean: 7.24 万tpm Max: 11.9 万tpm
first latency: Mean: 1.59 s Max: 11.3 s
tail latency: Mean: 23.6 s Max: 46.2 s
qwen30b-a3b:
Mean: 0.00267 qps Max: 0.109 qps
## 模型
名称qwen3-235b-a22b版本256k-0717
名称qwen3-235b-a22b版本0717-eagle-0820
qwen3-30b-a3b版本1m-instruct-0726-fp4
名称qwen3-30b-a3b版本1m-thinking-0728-fp4
名称qwen3-coder-next版本1m-20260129-re-mtp-fp8-torch-dtype
名称qwen3-coder-next版本1m-20260129-xml-tool-parser-fix
名称qwen3.5-27b版本256k-0223-internal
名称qwen3.5-27b版本256k-0223-internal-nvfp4-inputscale-fp8-attn
```
"cache_volume": {
"enabled": true,
"scope": "application"
},
"cpfs_file_system_id": "bmcpfs-290qtyip73f85z7zt9t"
```
## dashllm_cmd serving
```
[INFO] 2026-03-27 18:42:51,933869: {"message":"vllm engine_args: {'model': '/dev/shm/dashllm_model_2', 'device': 'cuda', 'dtype': 'bfloat16', 'tensor_parallel_size': 1, 'enforce_eager': False, 'gpu_memory_utilization': 0.8, 'block_size': 256, 'swap_space': 1, 'max_num_seqs': 256, 'max_num_batched_tokens': 4096, 'trust_remote_code': True, 'disable_custom_all_reduce': False, 'skip_tokenizer_init': False, 'quantization': None, 'max_model_len': 262144, 'compilation_config': {'use_inductor': False, 'custom_ops': ['all']}, 'enable_prefix_caching': True, 'distributed_executor_backend': 'mp', 'enable_chunked_prefill': True, 'max_seq_len_to_capture': 262144}","time":"2026-03-27 18:42:51.933"}
```
## 阿里模型 env
qwen3.5-27b 一定需要 BLADNN 来支持 vl attn kernel
qwen3-30b/235b/coder 都可以在不使用 BLADNN 的情况下启动
对于 235b/coder 这张已经 FP8 量化的模型,使用 BLADNN 会报错
- qwen3-coder
```
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN
```
- qwen3.5-27b
```
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN
```
```bash
####################################
# Qwen3.5-27B
####################################
VLLM_DISABLE_COMPILE_CACHE=1 VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3.5-27B --tensor-parallel-size 1 --mamba_cache_mode light --max-num-seqs 64 --max-num-batched-tokens 1000000 --long-prefill-token-threshold 30000 --skip_mm_profiling --mm-processor-cache-gb 0
VLLM_DISABLE_COMPILE_CACHE=1 VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_FUSED_QKVZBA_KERNEL=0 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3.5-27B --tensor-parallel-size 1 --mamba_cache_mode light --max-num-seqs 64 --max-num-batched-tokens 40960 --long-prefill-token-threshold 30000 --skip_mm_profiling --mm-processor-cache-gb 0
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_FUSED_QKVZBA_KERNEL=0 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3.5-27B --tensor-parallel-size 1 --max-num-seqs 64 --max-num-batched-tokens 40960 #--long-prefill-token-threshold 30000 #--skip_mm_profiling --mm-processor-cache-gb 0
#--long_context_threshold 30000
#Qwen3_5ForConditionalGeneration
####################################
# Qwen3-Coder
####################################
VLLM_MOE_EXPERTS_OVERLAP=1 TORCH_CUDA_ARCH_LIST="9.0+PTX" VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-Next-FP8 --tensor-parallel-size 2
# ok
vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-Next-FP8 --tensor-parallel-size 2
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-Next-FP8 --tensor-parallel-size 2
####################################
# Qwen3-30B-A3B
####################################
# ok
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-30B-A3B --tensor-parallel-size 2
####################################
# Qwen3-235B-A22B
####################################
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_USE_DEEP_GEMM=0 vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-235B-A22B-FP8 --tensor-parallel-size 4
# Ok
vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-235B-A22B-FP8 --tensor-parallel-size 4
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_USE_DEEP_GEMM=0 vllm serve resource/model/464482ce.qwen3-235b-a22b/128k-0426/ --tensor-parallel-size 4
'{"gpu_memory_utilization": 0.9, "max_model_len": 262144, "enable_chunked_prefill": true, "enable_think": 1, "think_mode": "auto", "tensor_parallel_size": 1, "dtype": "bfloat16", "enforce_eager": false, "enable_prefix_caching": true, "mamba_cache_mode": "light", "distributed_executor_backend": "mp", "block_size": 64, "max_num_batched_tokens": 8192, "disable_cascade_attn": true, "speculative_config": {"method": "qwen3_next_vl_mtp", "num_speculative_tokens": 3}, "mm_processor_cache_gb": 0, "limit_mm_per_prompt": {"image": 256, "video": 64}, "compilation_config": {"cudagraph_mode": "FULL_AND_PIECEWISE", "use_inductor": false, "pass_config": {"fuse_norm_quant": false, "fuse_act_quant": false, "fuse_attn_quant": false}}, "mamba_cache_dtype": "float32", "skip_mm_profiling": true, "quantization": "fp8"}'
```
1 GPU: TP1DP1
2 GPU: (TP2DP1, TP1DP2) x (EPON, EPOFF)
4 GPU: (TP4DP1, TP2DP2, TP1DP4) x (EPON, EPOFF)
8 GPU: (TP8DP1, TP4DP2, TP2DP4, TP1DP8) x (EPON, EPOFF)
## E2E 测试
【qwen3-coder】【0-30k】【kvs】【h20-96-d】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-nosparse-model/deployments/qwen3-coder-nosparse-model-ba4a
【qwen3-coder-flash】【0-30k】【kvs】【h20-96-d】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-flash-2025-07-28-nosparse-model/deployments/qwen3-coder-flash-2025-07-28-nosparse-model-1553
【qwen3-30b-a3b-instruct】【H20-96G-4】
1. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-30b-a3b-instruct-2507-model/deployments/qwen3-30b-a3b-instruct-2507-model-a06c
- 0.9.0
【qwen3-30b-a3b-thinking】【H20-96G-4】
1. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-30b-a3b-thinking-2507-model?spm=43a6e6f6.2e152c3f.0.0.6d4c103cudzmEy
- 0.10.1rc2.dev397+g312aa870b
【qwen3-235b-a22b-thinking】【P】【H20-96G-8】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-235b-a22b-thinking-2507-qwenapp-crit/deployments/qwen3-235b-a22b-thinking-2507-qwenapp-crit-4945
- 0.11.2.dev1732+gd694e5c71.d20251208
【qwen3-235b-a22b-thinking】【D】【H20-96G-8】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-235b-a22b-thinking-2507-qwenapp-crit-decode/deployments/qwen3-235b-a22b-thinking-2507-qwenapp-crit-decode-21fd
- 0.11.2.dev1732+gd694e5c71.d20251208
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen-plus-2025-07-28-model/deployments/qwen-plus-2025-07-28-model-85ed?spm=43a6e6f6.660e3d6f.0.0.622e103cCLyFsA
【qwen3.5-27b】【0-32k】【H20-96G-8】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3.5-27b-text-model/deployments/qwen3.5-27b-text-model-e277
cuda128_cp312_test_vllm_87905ee0_20260222_202123
0.13.0rc2.dev2067+g486e99474.d20260222.cu128
【qwen3.5-27b】【0-32k】【5090-8】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3.5-27b-text-model/deployments/qwen3.5-27b-text-model-f462
cuda129_cp312_test_vllm_11606
0.13.0rc2.dev2111+gb44b43f43.d20260309
【qwen3-coder-next】【0-32k】【H20-96G-8】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-next-model/deployments/qwen3-coder-next-model-e776?spm=43a6e6f6.5b0a3d6a.0.0.413d103cIwReWg
- 0.10.2rc2.dev168+g8f0fc60c9.d20251204
1. Hardware5090, H20
2. ModelQwen3.5-27B, Qwen3-30B-A3B, Qwen3-235B-A22B-FP8, Qwen3-Coder-Next-FP8
3. Trace: Chat, Thinking, Coder
测试组合:
Hardware 实验
- 【qwen3.5-27b + 5090】
- 【qwen3.5-27b + H20】
Model 实验
- 【qwen3.5-27b + H20】
- 【qwen3-30b-a3b + H20】
- 【qwen3-235b-a22b + H20】
Trace 实验
- 【qwen3-30b-a3b + H20 + Chat】
- 【qwen3-30b-a3b + H20 + Thinking】
- 【qwen3-235b-a22b + H20 + Chat】
- 【qwen3-235b-a22b + H20 + Thinking】
- 【qwen3-coder-next + H20 + Coder】
---
【qwen3-235b-a22b-instruct】【P】【8-32k】【H20-96G-8】
1. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen-plus-2025-07-28-model/deployments/qwen-plus-2025-07-28-model-5966?spm=43a6e6f6.660e3d6f.0.0.345a103cMNZMSV
2. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen-plus-2025-07-28-model/deployments/qwen-plus-2025-07-28-model-9f59?spm=43a6e6f6.660e3d6f.0.0.345a103cMNZMSV
- 0.13.0rc2.dev1948+g613d885a1.d20260108.cu128
## 部署
【qwen3-max-2026-01-23-chat-aa8c】qwen3-max nonthinking
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-aa8c?spm=43a6e6f6.33db9dd0.0.0.6a49103cRBNW6W
【qwen3-max-2026-01-23-chat-9bf8】qwen3-max thinking
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-9bf8?spm=43a6e6f6.5a1b7ab3.0.0.7c9b103caxnySJ
首包耗时Mean: 2.93 s; Max: 9.27 s
尾包耗时Mean: 1.60 min; Max: 2.57 min
[ 363e4d99 | v-6ffe2b5b | qwen3-max | cn-beijing ] - Mean: 10.7 qps - Max: 20.0 qps
[ 363e4d99 | v-6ffe2b5b | qwen3-max | cn-beijing ] - Mean: 1.73 s - Max: 8.83 s
[ 363e4d99 | v-6ffe2b5b | qwen3-max | cn-beijing ] - Mean: 1.42 min - Max: 2.22 min
【qwen3-max-qwenapp-crit-50e9】
【qwen3-max-qwenapp-crit-decode】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-qwenapp-crit-decode?spm=43a6e6f6.29ced41b.0.0.4efb103cjsq29c
Input: Mean: 9131 itpr ; Max: 10817 itpr
Output: Mean: 823 otpr ; Max: 987 otpr
weighted tps - Mean: 46.6 otpsr - Min: 44.5 otpsr - Max: 48.2 otpsr
- Mean: 2.83 万tpm - Max: 3.48 万tpm
tail: [ 2c3bc7a4 | cn-beijing ] - Mean: 35.0 s - Max: 1.40 min
【qwen3-max-2025-10-30-thinking-model】
Input: Mean: 6074 itpr ; Max: 16790 itpr
Output: Mean: 2062 otpr Max: 4153 otpr
1. Qwen3-Chat 【nonthinking】:
qwen3-max-2026-01-23-chat-aa8c-info【v1: 包含 input/timestamp 等】
qwen3-max-2026-01-23-chat-aa8c-info【v2: 可一次性采集一周,包含 output_length】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-aa8c?spm=43a6e6f6.33db9dd0.0.0.6a49103cRBNW6W
2. Qwen3-Coder
qwen3-coder-next-model【包含 input/timestamp 等】
qwen3-coder-next-model-8130-info【包含 output_length】
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-next-model/deployments/qwen3-coder-next-model-8130
3. Qwen3-Chat 【thinking】
qwen3-max-2026-01-23-chat-9bf8
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-9bf8?spm=43a6e6f6.5a1b7ab3.0.0.7c9b103caxnySJ
0319~0324: https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-694a?spm=43a6e6f6.33db9dd0.0.0.297a43617e9ySH
0324~0326: https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-9bf8?spm=43a6e6f6.33db9dd0.0.0.297a43617e9ySH
0326+: https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-3201?spm=43a6e6f6.33db9dd0.0.0.297a43617e9ySH