Gahow Wang 9dee25907b Add P/D ratio ablation: 6P+2D vs 4P+4D vs Combined
6P+2D gives more GPUs to prefill, fewer to decode:
- Decode util: 7.8% (4D) -> 19.0% (2D), less waste
- TTFT: 1.99s (4P) -> 1.48s (6P), -26% from less prefill queuing
- But Combined (30.5% util, TTFT 1.01s) still best overall

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 22:42:20 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%