Gahow Wang 42bcd31976 TP=2 DP=4 + hybrid routing: best TTFT at cost of TPOT
TP=2 DP=4 with hybrid routing achieves TTFT p50=0.611s (-43% vs TP=1),
the best TTFT across all tested configurations. But TPOT p90=0.109s
(+51% vs TP=1) due to cross-GPU all-reduce in decode.

Full comparison across 7 configurations shows two Pareto-optimal points:
  TP=1 DP=8 hybrid: best TPOT (0.072s), good TTFT (1.064s)
  TP=2 DP=4 hybrid: best TTFT (0.611s), acceptable TPOT (0.109s)

The choice depends on SLO:
  TTFT-sensitive (interactive) -> TP=2 DP=4
  TPOT-sensitive (streaming)   -> TP=1 DP=8

All PD-Sep configurations are strictly dominated by one of these two.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 10:35:18 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%