MB5 analysis: per-role KV split proves static-partition mismatch

aggregate_mb5.py:
- Split the cluster KV timeline by role (P-pool vs D-pool) using a
  PID->role map parsed from vllm_logs filenames. The cluster average
  hid the result — 6P+2D/4P+4D look ~45% utilized but the decode pool
  is actually pegged at ~100% while prefill idles at ~30%.
- Two-stage reduce/plot: --reduce-to (numpy-only, runs on the serving
  host over multi-GB snapshot dirs) dumps a compact JSON; --from-reduced
  (matplotlib) renders locally. matplotlib import is now lazy.
- New plot_role_split figure + p/d peak/steady columns in the CSV.

PD_DISAGG_RESULTS.md: consolidated writeup with figures inline.
Verdict: no static P:D ratio beats 8C colocation. The binding
constraint moves with the ratio (D-pool saturates at 6P+2D/4P+4D,
P-pool jams at 2P+6D -> 91% request loss); 8C's shared pool stays
elastic at 34% steady, 100% completion. PD wins TPOT (10-35x cleaner,
the MB1 phase-isolation benefit is real) but loses TTFT and sheds
load. Round-robin P routing also zeroes prefix-cache reuse; a
session-affinity re-run of 6P+2D is in flight to test the fix.

Figures (rep1): mb5_kv_timeline, mb5_role_split, mb5_peak_utilization,
mb5_latency_compare + mb5_summary.csv.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-28 12:05:17 +08:00
parent e8980ce957
commit 8596135680
8 changed files with 424 additions and 33 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
figs/mb5/mb5_role_split.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 375 KiB

5
figs/mb5/mb5_summary.csv Normal file
View File

@@ -0,0 +1,5 @@
config,rep,n_requests,n_success,wall_clock_s,peak_pool_frac,steady_pool_frac,p_pool_peak_frac,p_pool_steady_frac,d_pool_peak_frac,d_pool_steady_frac,peak_waiting,latency_p50_s,latency_p90_s,latency_p99_s,ttft_p50_s,ttft_p90_s,ttft_p99_s,prefix_cache_hit_ratio
8C,1,1214,1214,2994.218414353032,0.7174957362137578,0.3439702956225128,,,,,29,10.82550932947197,83.34998885790122,194.10265863158946,6.967104309005663,53.12018221841427,114.12611859919207,0.1937163528742694
6P+2D,1,1214,1214,3419.065942236979,0.7726478112563957,0.42145750426378625,0.743272692817889,0.3082291074474133,0.9959636156907333,0.7434906196702672,128,44.48975181748392,91.82252187062406,147.70196208347772,40.95952733900049,86.68752026481089,142.84028979733685,0.0
4P+4D,1,1214,1214,4170.666486939997,0.6997939169982945,0.45876918703808983,0.6438459351904491,0.28540363843092664,0.9753411028993746,0.5977686185332576,152,59.52004547297838,157.08703426021387,224.03997302683115,56.419772224500775,153.07864206891392,219.73412787001706,0.0
2P+6D,1,1214,109,5761.816568834998,0.9698692438885731,0.9435119386014781,0.9969869243888573,0.9198408186469585,0.9620238772029562,0.9494504453287853,872,26.293884326005355,499.3484142678091,577.7122636228032,23.580788671970367,498.0334587502061,576.5306194114453,0.0
1 config rep n_requests n_success wall_clock_s peak_pool_frac steady_pool_frac p_pool_peak_frac p_pool_steady_frac d_pool_peak_frac d_pool_steady_frac peak_waiting latency_p50_s latency_p90_s latency_p99_s ttft_p50_s ttft_p90_s ttft_p99_s prefix_cache_hit_ratio
2 8C 1 1214 1214 2994.218414353032 0.7174957362137578 0.3439702956225128 29 10.82550932947197 83.34998885790122 194.10265863158946 6.967104309005663 53.12018221841427 114.12611859919207 0.1937163528742694
3 6P+2D 1 1214 1214 3419.065942236979 0.7726478112563957 0.42145750426378625 0.743272692817889 0.3082291074474133 0.9959636156907333 0.7434906196702672 128 44.48975181748392 91.82252187062406 147.70196208347772 40.95952733900049 86.68752026481089 142.84028979733685 0.0
4 4P+4D 1 1214 1214 4170.666486939997 0.6997939169982945 0.45876918703808983 0.6438459351904491 0.28540363843092664 0.9753411028993746 0.5977686185332576 152 59.52004547297838 157.08703426021387 224.03997302683115 56.419772224500775 153.07864206891392 219.73412787001706 0.0
5 2P+6D 1 1214 109 5761.816568834998 0.9698692438885731 0.9435119386014781 0.9969869243888573 0.9198408186469585 0.9620238772029562 0.9494504453287853 872 26.293884326005355 499.3484142678091 577.7122636228032 23.580788671970367 498.0334587502061 576.5306194114453 0.0