415 lines
17 KiB
Markdown
415 lines
17 KiB
Markdown
|
||
## TODO
|
||
|
||
done:
|
||
- codex-chat
|
||
- codex-chat-5090: codex resume 019d4945-4991-7331-a848-1be6fd702e9f
|
||
- codex-coder
|
||
|
||
- scoot-chat
|
||
- scoot-thinking-prefill
|
||
- scoot-thinking-decode
|
||
|
||
dash1: codex-thinking-decode
|
||
dash2: codex-thinking-prefill
|
||
dash3: scoot-coder
|
||
dash5: scoot-chat-5090
|
||
|
||
|
||
```bash
|
||
# chat
|
||
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/custom_trace_windows/qwen_coder_next_internal_chat_day0_t0p002_fixedcount/sampled_traces/chat_w20260311_peak_1000.jsonl
|
||
|
||
# prefill-only
|
||
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/custom_trace_windows/qwen_coder_next_internal_thinking_day0_t0p04_fixedcount/sampled_traces/thinking_w20260323_peak_1000.jsonl
|
||
|
||
# coder
|
||
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/custom_trace_windows/qwen_coder_next_internal_coder_peak_7day_fixedcount/sampled_traces/coder_w20260311_peak_1000.jsonl
|
||
|
||
# decode-only
|
||
/home/admin/cpfs/wjh/aituner/tuner-workload-principle/workflow_output/plans/dash0123_8gpu__qwen235b__internal__decode_only__thinking__legal11_thinking_decode_only_weekly0321_0327_peak_local8/traces/thinking_w20260321_peak_1000.jsonl
|
||
```
|
||
|
||
所以我会在代码里直接把 internal profile 的“必须 chunked prefill”为硬约束,而不是继续靠超时失败去学
|
||
|
||
Fig 7/8 加上和 real trace 相同的 semi-real
|
||
|
||
ongoing:
|
||
dash0: qwen235b decode-only 测试
|
||
dash1/2: qwen235b thinking 30min 测试
|
||
dash3: qwen-coder-next coder 30min 测试
|
||
dash5: 5090 qwen27b chat-0-32k 测试
|
||
|
||
|
||
4.1
|
||
证明不同 workload 不能用来 tune 不同 cluster 的数据
|
||
|
||
✅4.2
|
||
✅synthetic/semi-real/real 的性能对比数据:
|
||
83.91, 98.19, 98.4
|
||
65.22, 86,03,98.28
|
||
|
||
✅synthetic/semi-real/real 的相似度对比数据
|
||
|
||
chat/thinking/coder 的 prefix 下 tuned best 的对比数据
|
||
chat/thinking/coder 的 prefix 下的相似度对比数据
|
||
|
||
4.3
|
||
agent harness 总结
|
||
|
||
5
|
||
tuner vs baseline
|
||
|
||
✅default config
|
||
|
||
|
||
|
||
```bash
|
||
# Qwen3.5-27B
|
||
# https://huggingface.co/Qwen/Qwen3.5-27B
|
||
vllm serve Qwen/Qwen3.5-27B \
|
||
--tensor-parallel-size 8 \
|
||
--max-model-len 262144 \
|
||
--reasoning-parser qwen3
|
||
|
||
# Qwen3-Coder-Next
|
||
# https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3-Next.html#basic-multi-gpu-setup
|
||
vllm serve Qwen/Qwen3-Coder-Next-FP8 \
|
||
--tensor-parallel-size 4 \
|
||
--enable-prefix-caching
|
||
|
||
# Qwen3-235B-A22B-FP8
|
||
# https://huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-FP8
|
||
vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
|
||
--tensor-parallel-size 4 \
|
||
--max-model-len 262144
|
||
|
||
# https://github.com/aliez-ren/vllm-qwen3.5-nvfp4-sm120?utm_source=chatgpt.com
|
||
vllm serve Kbenkhaled/Qwen3.5-27B-NVFP4 \
|
||
--max-model-len 234567 \
|
||
--gpu-memory-utilization 0.89 \
|
||
--max-num-seqs 4 \
|
||
--max-num-batched-tokens 4096
|
||
|
||
vllm serve Qwen/Qwen3.5-27B-FP8 \
|
||
--quantization fp8 \
|
||
--dtype auto \
|
||
--gpu-memory-utilization 0.85 \
|
||
--max-model-len 131072 \
|
||
--max-num-seqs 2 \
|
||
--max-num-batched-tokens 2048 \
|
||
--tensor-parallel-size 1
|
||
|
||
# https://huggingface.co/Qwen/Qwen3.5-27B-FP8?utm_source=chatgpt.com
|
||
vllm serve Qwen/Qwen3.5-27B-FP8 --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser qwen3
|
||
```
|
||
|
||
|
||
|
||
|
||
```bash
|
||
|
||
# 跑 qwen27b batching
|
||
# running on dash2
|
||
bash /home/admin/cpfs/wjh/aituner/tuner-workload-principle/launch_qwen35_27b_tp2dp1_epoff_batching_chat0_32k_weekly_peak.sh
|
||
|
||
|
||
# 跑 evaluator 的对比
|
||
# running on dash1
|
||
CASE_KIND=chat TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
|
||
|
||
CASE_KIND=coder TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
|
||
|
||
CASE_KIND=thinking_prefill TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
|
||
|
||
CASE_KIND=thinking_decode TRACE_SUITE=sourcekind bash ./launch_workload_evaluator_compare.sh
|
||
|
||
|
||
|
||
# [x] 对 qwen35_27b,对齐线上 trace,search 0~4k threshold 后再测试
|
||
./workflow threshold-search \
|
||
--hardware dash0123_8gpu \
|
||
--model qwen35_27b \
|
||
--engine internal \
|
||
--workload chat \
|
||
--phase prefill_decode \
|
||
--trace-type chat-0-4k \
|
||
--max-threshold 0.5
|
||
|
||
|
||
# qwen3-coder-next 跑不了 EP?
|
||
|
||
|
||
# [x] dash3 上的要重新放到 dash2 跑
|
||
cd /home/admin/cpfs/wjh/aituner/tuner-workload-principle
|
||
bash workflow_output/plans/dash0123_8gpu__qwen35_27b__internal__prefill_decode__chat__legal10_chat0_4k/run_results_v2_trace_dash3.sh --machine-label dash2
|
||
|
||
|
||
# [x] 跑 qwen35_27b 0~32k
|
||
./launch_qwen35_27b_chat_0_32k_after_trace_prepare.sh
|
||
|
||
```
|
||
|
||
## 可运行情况
|
||
|
||
qwen-235b ✅
|
||
qwen27b ✅
|
||
qwen-coder:需要切换到对应的 vllm
|
||
qwen-30b:需要切换到对应的 container 支持 flash-infer 的版本
|
||
|
||
```
|
||
pip install -U flashinfer-python
|
||
flashinfer >= 0.7
|
||
|
||
wjh@ds-f74814b6-1-65cd484875-256zt:~$ pip list | grep flashinfer
|
||
flashinfer-cubin 0.6.4
|
||
flashinfer-jit-cache 0.6.4
|
||
flashinfer-python 0.6.4
|
||
|
||
```
|
||
|
||
## 线上性能
|
||
|
||
qwen27b:
|
||
40 instance: Mean: 4.00 qps Max: 5.67 qps
|
||
|
||
prefill: Mean: 19.3 万tpm Max: 33.9 万tpm
|
||
decode: Mean: 7.24 万tpm Max: 11.9 万tpm
|
||
first latency: Mean: 1.59 s Max: 11.3 s
|
||
tail latency: Mean: 23.6 s Max: 46.2 s
|
||
|
||
|
||
qwen30b-a3b:
|
||
Mean: 0.00267 qps Max: 0.109 qps
|
||
|
||
|
||
|
||
## 模型
|
||
|
||
名称:qwen3-235b-a22b版本:256k-0717
|
||
名称:qwen3-235b-a22b版本:0717-eagle-0820
|
||
|
||
qwen3-30b-a3b版本:1m-instruct-0726-fp4
|
||
名称:qwen3-30b-a3b版本:1m-thinking-0728-fp4
|
||
|
||
名称:qwen3-coder-next版本:1m-20260129-re-mtp-fp8-torch-dtype
|
||
名称:qwen3-coder-next版本:1m-20260129-xml-tool-parser-fix
|
||
|
||
名称:qwen3.5-27b版本:256k-0223-internal
|
||
名称:qwen3.5-27b版本:256k-0223-internal-nvfp4-inputscale-fp8-attn
|
||
|
||
|
||
```
|
||
"cache_volume": {
|
||
"enabled": true,
|
||
"scope": "application"
|
||
},
|
||
"cpfs_file_system_id": "bmcpfs-290qtyip73f85z7zt9t"
|
||
```
|
||
|
||
|
||
## dashllm_cmd serving
|
||
|
||
```
|
||
[INFO] 2026-03-27 18:42:51,933869: {"message":"vllm engine_args: {'model': '/dev/shm/dashllm_model_2', 'device': 'cuda', 'dtype': 'bfloat16', 'tensor_parallel_size': 1, 'enforce_eager': False, 'gpu_memory_utilization': 0.8, 'block_size': 256, 'swap_space': 1, 'max_num_seqs': 256, 'max_num_batched_tokens': 4096, 'trust_remote_code': True, 'disable_custom_all_reduce': False, 'skip_tokenizer_init': False, 'quantization': None, 'max_model_len': 262144, 'compilation_config': {'use_inductor': False, 'custom_ops': ['all']}, 'enable_prefix_caching': True, 'distributed_executor_backend': 'mp', 'enable_chunked_prefill': True, 'max_seq_len_to_capture': 262144}","time":"2026-03-27 18:42:51.933"}
|
||
```
|
||
|
||
|
||
|
||
|
||
## 阿里模型 env
|
||
|
||
qwen3.5-27b 一定需要 BLADNN 来支持 vl attn kernel
|
||
qwen3-30b/235b/coder 都可以在不使用 BLADNN 的情况下启动
|
||
对于 235b/coder 这张已经 FP8 量化的模型,使用 BLADNN 会报错
|
||
|
||
|
||
- qwen3-coder
|
||
```
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN
|
||
```
|
||
|
||
- qwen3.5-27b
|
||
```
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN
|
||
```
|
||
|
||
```bash
|
||
####################################
|
||
# Qwen3.5-27B
|
||
####################################
|
||
VLLM_DISABLE_COMPILE_CACHE=1 VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3.5-27B --tensor-parallel-size 1 --mamba_cache_mode light --max-num-seqs 64 --max-num-batched-tokens 1000000 --long-prefill-token-threshold 30000 --skip_mm_profiling --mm-processor-cache-gb 0
|
||
|
||
VLLM_DISABLE_COMPILE_CACHE=1 VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_FUSED_QKVZBA_KERNEL=0 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3.5-27B --tensor-parallel-size 1 --mamba_cache_mode light --max-num-seqs 64 --max-num-batched-tokens 40960 --long-prefill-token-threshold 30000 --skip_mm_profiling --mm-processor-cache-gb 0
|
||
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_FUSED_QKVZBA_KERNEL=0 VLLM_GDN_USE_BLADNN=0 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3.5-27B --tensor-parallel-size 1 --max-num-seqs 64 --max-num-batched-tokens 40960 #--long-prefill-token-threshold 30000 #--skip_mm_profiling --mm-processor-cache-gb 0
|
||
|
||
#--long_context_threshold 30000
|
||
#Qwen3_5ForConditionalGeneration
|
||
|
||
|
||
####################################
|
||
# Qwen3-Coder
|
||
####################################
|
||
VLLM_MOE_EXPERTS_OVERLAP=1 TORCH_CUDA_ARCH_LIST="9.0+PTX" VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-Next-FP8 --tensor-parallel-size 2
|
||
# ok
|
||
vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-Next-FP8 --tensor-parallel-size 2
|
||
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_GDN_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-Next-FP8 --tensor-parallel-size 2
|
||
|
||
|
||
####################################
|
||
# Qwen3-30B-A3B
|
||
####################################
|
||
# ok
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-30B-A3B --tensor-parallel-size 2
|
||
|
||
|
||
####################################
|
||
# Qwen3-235B-A22B
|
||
####################################
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_USE_DEEP_GEMM=0 vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-235B-A22B-FP8 --tensor-parallel-size 4
|
||
# Ok
|
||
vllm serve /home/admin/cpfs/wjh/models/Qwen/Qwen3-235B-A22B-FP8 --tensor-parallel-size 4
|
||
|
||
VLLM_FP8_USE_BLADNN=1 VLLM_MOE_USE_BLADNN=1 VLLM_USE_V1=1 VLLM_IS_HYBRID_MODEL=1 VLLM_ENABLE_TORCH_COMPILE=1 VLLM_USE_DEEP_GEMM=0 vllm serve resource/model/464482ce.qwen3-235b-a22b/128k-0426/ --tensor-parallel-size 4
|
||
|
||
|
||
'{"gpu_memory_utilization": 0.9, "max_model_len": 262144, "enable_chunked_prefill": true, "enable_think": 1, "think_mode": "auto", "tensor_parallel_size": 1, "dtype": "bfloat16", "enforce_eager": false, "enable_prefix_caching": true, "mamba_cache_mode": "light", "distributed_executor_backend": "mp", "block_size": 64, "max_num_batched_tokens": 8192, "disable_cascade_attn": true, "speculative_config": {"method": "qwen3_next_vl_mtp", "num_speculative_tokens": 3}, "mm_processor_cache_gb": 0, "limit_mm_per_prompt": {"image": 256, "video": 64}, "compilation_config": {"cudagraph_mode": "FULL_AND_PIECEWISE", "use_inductor": false, "pass_config": {"fuse_norm_quant": false, "fuse_act_quant": false, "fuse_attn_quant": false}}, "mamba_cache_dtype": "float32", "skip_mm_profiling": true, "quantization": "fp8"}'
|
||
```
|
||
|
||
|
||
1 GPU: TP1DP1
|
||
2 GPU: (TP2DP1, TP1DP2) x (EPON, EPOFF)
|
||
4 GPU: (TP4DP1, TP2DP2, TP1DP4) x (EPON, EPOFF)
|
||
8 GPU: (TP8DP1, TP4DP2, TP2DP4, TP1DP8) x (EPON, EPOFF)
|
||
|
||
## E2E 测试
|
||
|
||
【qwen3-coder】【0-30k】【kvs】【h20-96-d】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-nosparse-model/deployments/qwen3-coder-nosparse-model-ba4a
|
||
|
||
【qwen3-coder-flash】【0-30k】【kvs】【h20-96-d】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-flash-2025-07-28-nosparse-model/deployments/qwen3-coder-flash-2025-07-28-nosparse-model-1553
|
||
|
||
|
||
【qwen3-30b-a3b-instruct】【H20-96G-4】
|
||
1. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-30b-a3b-instruct-2507-model/deployments/qwen3-30b-a3b-instruct-2507-model-a06c
|
||
- 0.9.0
|
||
|
||
【qwen3-30b-a3b-thinking】【H20-96G-4】
|
||
1. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-30b-a3b-thinking-2507-model?spm=43a6e6f6.2e152c3f.0.0.6d4c103cudzmEy
|
||
- 0.10.1rc2.dev397+g312aa870b
|
||
|
||
【qwen3-235b-a22b-thinking】【P】【H20-96G-8】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-235b-a22b-thinking-2507-qwenapp-crit/deployments/qwen3-235b-a22b-thinking-2507-qwenapp-crit-4945
|
||
- 0.11.2.dev1732+gd694e5c71.d20251208
|
||
|
||
【qwen3-235b-a22b-thinking】【D】【H20-96G-8】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-235b-a22b-thinking-2507-qwenapp-crit-decode/deployments/qwen3-235b-a22b-thinking-2507-qwenapp-crit-decode-21fd
|
||
- 0.11.2.dev1732+gd694e5c71.d20251208
|
||
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen-plus-2025-07-28-model/deployments/qwen-plus-2025-07-28-model-85ed?spm=43a6e6f6.660e3d6f.0.0.622e103cCLyFsA
|
||
|
||
【qwen3.5-27b】【0-32k】【H20-96G-8】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3.5-27b-text-model/deployments/qwen3.5-27b-text-model-e277
|
||
cuda128_cp312_test_vllm_87905ee0_20260222_202123
|
||
0.13.0rc2.dev2067+g486e99474.d20260222.cu128
|
||
|
||
【qwen3.5-27b】【0-32k】【5090-8】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3.5-27b-text-model/deployments/qwen3.5-27b-text-model-f462
|
||
cuda129_cp312_test_vllm_11606
|
||
0.13.0rc2.dev2111+gb44b43f43.d20260309
|
||
|
||
【qwen3-coder-next】【0-32k】【H20-96G-8】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-next-model/deployments/qwen3-coder-next-model-e776?spm=43a6e6f6.5b0a3d6a.0.0.413d103cIwReWg
|
||
- 0.10.2rc2.dev168+g8f0fc60c9.d20251204
|
||
|
||
|
||
1. Hardware:5090, H20
|
||
2. Model:Qwen3.5-27B, Qwen3-30B-A3B, Qwen3-235B-A22B-FP8, Qwen3-Coder-Next-FP8
|
||
3. Trace: Chat, Thinking, Coder
|
||
|
||
测试组合:
|
||
Hardware 实验
|
||
- 【qwen3.5-27b + 5090】
|
||
- 【qwen3.5-27b + H20】
|
||
Model 实验
|
||
- 【qwen3.5-27b + H20】
|
||
- 【qwen3-30b-a3b + H20】
|
||
- 【qwen3-235b-a22b + H20】
|
||
Trace 实验
|
||
- 【qwen3-30b-a3b + H20 + Chat】
|
||
- 【qwen3-30b-a3b + H20 + Thinking】
|
||
- 【qwen3-235b-a22b + H20 + Chat】
|
||
- 【qwen3-235b-a22b + H20 + Thinking】
|
||
- 【qwen3-coder-next + H20 + Coder】
|
||
|
||
|
||
|
||
---
|
||
【qwen3-235b-a22b-instruct】【P】【8-32k】【H20-96G-8】
|
||
1. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen-plus-2025-07-28-model/deployments/qwen-plus-2025-07-28-model-5966?spm=43a6e6f6.660e3d6f.0.0.345a103cMNZMSV
|
||
2. https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen-plus-2025-07-28-model/deployments/qwen-plus-2025-07-28-model-9f59?spm=43a6e6f6.660e3d6f.0.0.345a103cMNZMSV
|
||
- 0.13.0rc2.dev1948+g613d885a1.d20260108.cu128
|
||
|
||
|
||
## 部署
|
||
|
||
【qwen3-max-2026-01-23-chat-aa8c】qwen3-max nonthinking
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-aa8c?spm=43a6e6f6.33db9dd0.0.0.6a49103cRBNW6W
|
||
|
||
|
||
|
||
【qwen3-max-2026-01-23-chat-9bf8】qwen3-max thinking
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-9bf8?spm=43a6e6f6.5a1b7ab3.0.0.7c9b103caxnySJ
|
||
|
||
首包耗时:Mean: 2.93 s; Max: 9.27 s
|
||
尾包耗时:Mean: 1.60 min; Max: 2.57 min
|
||
[ 363e4d99 | v-6ffe2b5b | qwen3-max | cn-beijing ] - Mean: 10.7 qps - Max: 20.0 qps
|
||
|
||
[ 363e4d99 | v-6ffe2b5b | qwen3-max | cn-beijing ] - Mean: 1.73 s - Max: 8.83 s
|
||
|
||
[ 363e4d99 | v-6ffe2b5b | qwen3-max | cn-beijing ] - Mean: 1.42 min - Max: 2.22 min
|
||
|
||
|
||
【qwen3-max-qwenapp-crit-50e9】
|
||
|
||
|
||
【qwen3-max-qwenapp-crit-decode】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-qwenapp-crit-decode?spm=43a6e6f6.29ced41b.0.0.4efb103cjsq29c
|
||
|
||
Input: Mean: 9131 itpr ; Max: 10817 itpr
|
||
Output: Mean: 823 otpr ; Max: 987 otpr
|
||
|
||
weighted tps - Mean: 46.6 otpsr - Min: 44.5 otpsr - Max: 48.2 otpsr
|
||
- Mean: 2.83 万tpm - Max: 3.48 万tpm
|
||
|
||
tail: [ 2c3bc7a4 | cn-beijing ] - Mean: 35.0 s - Max: 1.40 min
|
||
|
||
|
||
【qwen3-max-2025-10-30-thinking-model】
|
||
Input: Mean: 6074 itpr ; Max: 16790 itpr
|
||
Output: Mean: 2062 otpr Max: 4153 otpr
|
||
|
||
|
||
|
||
1. Qwen3-Chat 【nonthinking】:
|
||
qwen3-max-2026-01-23-chat-aa8c-info【v1: 包含 input/timestamp 等】
|
||
qwen3-max-2026-01-23-chat-aa8c-info【v2: 可一次性采集一周,包含 output_length】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-aa8c?spm=43a6e6f6.33db9dd0.0.0.6a49103cRBNW6W
|
||
2. Qwen3-Coder:
|
||
qwen3-coder-next-model【包含 input/timestamp 等】
|
||
qwen3-coder-next-model-8130-info【包含 output_length】
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-coder-next-model/deployments/qwen3-coder-next-model-8130
|
||
3. Qwen3-Chat 【thinking】:
|
||
qwen3-max-2026-01-23-chat-9bf8
|
||
https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-9bf8?spm=43a6e6f6.5a1b7ab3.0.0.7c9b103caxnySJ
|
||
|
||
|
||
|
||
0319~0324: https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-694a?spm=43a6e6f6.33db9dd0.0.0.297a43617e9ySH
|
||
0324~0326: https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-9bf8?spm=43a6e6f6.33db9dd0.0.0.297a43617e9ySH
|
||
0326+: https://dashscope-spectrum.alibaba-inc.com/console/workspaces/33e6d810/applications/qwen3-max-2026-01-23-chat/deployments/qwen3-max-2026-01-23-chat-3201?spm=43a6e6f6.33db9dd0.0.0.297a43617e9ySH
|
||
|