Files
agentic-kvc/figs/mb5/mb5_summary.csv
Gahow Wang 8596135680 MB5 analysis: per-role KV split proves static-partition mismatch
aggregate_mb5.py:
- Split the cluster KV timeline by role (P-pool vs D-pool) using a
  PID->role map parsed from vllm_logs filenames. The cluster average
  hid the result — 6P+2D/4P+4D look ~45% utilized but the decode pool
  is actually pegged at ~100% while prefill idles at ~30%.
- Two-stage reduce/plot: --reduce-to (numpy-only, runs on the serving
  host over multi-GB snapshot dirs) dumps a compact JSON; --from-reduced
  (matplotlib) renders locally. matplotlib import is now lazy.
- New plot_role_split figure + p/d peak/steady columns in the CSV.

PD_DISAGG_RESULTS.md: consolidated writeup with figures inline.
Verdict: no static P:D ratio beats 8C colocation. The binding
constraint moves with the ratio (D-pool saturates at 6P+2D/4P+4D,
P-pool jams at 2P+6D -> 91% request loss); 8C's shared pool stays
elastic at 34% steady, 100% completion. PD wins TPOT (10-35x cleaner,
the MB1 phase-isolation benefit is real) but loses TTFT and sheds
load. Round-robin P routing also zeroes prefix-cache reuse; a
session-affinity re-run of 6P+2D is in flight to test the fix.

Figures (rep1): mb5_kv_timeline, mb5_role_split, mb5_peak_utilization,
mb5_latency_compare + mb5_summary.csv.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 12:05:17 +08:00

1.3 KiB

1configrepn_requestsn_successwall_clock_speak_pool_fracsteady_pool_fracp_pool_peak_fracp_pool_steady_fracd_pool_peak_fracd_pool_steady_fracpeak_waitinglatency_p50_slatency_p90_slatency_p99_sttft_p50_sttft_p90_sttft_p99_sprefix_cache_hit_ratio
28C1121412142994.2184143530320.71749573621375780.34397029562251282910.8255093294719783.34998885790122194.102658631589466.96710430900566353.12018221841427114.126118599192070.1937163528742694
36P+2D1121412143419.0659422369790.77264781125639570.421457504263786250.7432726928178890.30822910744741330.99596361569073330.743490619670267212844.4897518174839291.82252187062406147.7019620834777240.9595273390004986.68752026481089142.840289797336850.0
44P+4D1121412144170.6664869399970.69979391699829450.458769187038089830.64384593519044910.285403638430926640.97534110289937460.597768618533257615259.52004547297838157.08703426021387224.0399730268311556.419772224500775153.07864206891392219.734127870017060.0
52P+6D112141095761.8165688349980.96986924388857310.94351193860147810.99698692438885730.91984081864695850.96202387720295620.949450445328785387226.293884326005355499.3484142678091577.712263622803223.580788671970367498.0334587502061576.53061941144530.0