aituner

Files

Gahow Wang 93ce339d61 Document 27B TP sweep: per-GPU rises sharply with TP (dense), opposite of MoE

Under the length-aware TTFT SLO (4s + L_in/8k), dense Qwen3.5-27B per-GPU throughput:
TP1=0.065, TP2=0.2925 (4.5x), TP4>=0.908 (>=14x, ceiling-saturated). TP1 is TPOT-bound
(one H20 can't decode a 27B under 50ms/token once batched); loosening TTFT didn't move
TP1, confirming TPOT is the binding constraint. Opposite of MoE 30B-A3B where TP1 was
best per-GPU. Validates the harness + length-aware SLO produce meaningful, non-saturated
measurements (TP1/TP2). TP4 saturated -> lower bound.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-16 01:54:40 +08:00

harness-ablation

Document 27B TP sweep: per-GPU rises sharply with TP (dense), opposite of MoE

2026-06-16 01:54:40 +08:00

qwen27b-chat-0-8k-7day-compare

docs: expand qwen27b 0-8k compare summary

2026-04-17 20:45:24 +08:00

qwen27b-chat-pd-colocation

Add qwen27b and qwen235b tuning notes

2026-04-11 12:07:42 +08:00

qwen30b-community-vllm020

Add open source project metadata