agentic-kvc

Go to file

Gahow Wang 1e8628581b Fair A/B: Elastic P2P wins on ALL metrics vs baseline (fresh restart)

Same-condition comparison (both fresh restart, same trace, same params):
  Baseline (combined):  TTFT=2.383/27.622  TPOT90=0.117  E2E=10.232
  Elastic P2P (cap=4):  TTFT=1.315/13.179  TPOT90=0.075  E2E=5.708
  Delta:                -45%  / -52%        -36%          -44%

Key finding: TPOT p90 dropped 36% — confirming heavy prefill DOES
disrupt decode in combined mode, and elastic offload effectively
isolates it. Previous comparisons missed this because baselines
were run under different conditions (stale instances, different time_scale).

GPU util: elastic uses less GPU (15.8% vs 28.7%) but achieves better
latency — higher efficiency through better cache distribution.

APC: elastic has more balanced per-instance APC (36-38% prefix + 30-35%
external) vs baseline's skewed distribution (3.8% - 68.3%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-22 15:48:51 +08:00

analysis

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

patches

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

scripts

Fair A/B: Elastic P2P wins on ALL metrics vs baseline (fresh restart)