Gahow Wang 1e8628581b Fair A/B: Elastic P2P wins on ALL metrics vs baseline (fresh restart)
Same-condition comparison (both fresh restart, same trace, same params):
  Baseline (combined):  TTFT=2.383/27.622  TPOT90=0.117  E2E=10.232
  Elastic P2P (cap=4):  TTFT=1.315/13.179  TPOT90=0.075  E2E=5.708
  Delta:                -45%  / -52%        -36%          -44%

Key finding: TPOT p90 dropped 36% — confirming heavy prefill DOES
disrupt decode in combined mode, and elastic offload effectively
isolates it. Previous comparisons missed this because baselines
were run under different conditions (stale instances, different time_scale).

GPU util: elastic uses less GPU (15.8% vs 28.7%) but achieves better
latency — higher efficiency through better cache distribution.

APC: elastic has more balanced per-instance APC (36-38% prefix + 30-35%
external) vs baseline's skewed distribution (3.8% - 68.3%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 15:48:51 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%