aggregate_mb5.py: - Split the cluster KV timeline by role (P-pool vs D-pool) using a PID->role map parsed from vllm_logs filenames. The cluster average hid the result — 6P+2D/4P+4D look ~45% utilized but the decode pool is actually pegged at ~100% while prefill idles at ~30%. - Two-stage reduce/plot: --reduce-to (numpy-only, runs on the serving host over multi-GB snapshot dirs) dumps a compact JSON; --from-reduced (matplotlib) renders locally. matplotlib import is now lazy. - New plot_role_split figure + p/d peak/steady columns in the CSV. PD_DISAGG_RESULTS.md: consolidated writeup with figures inline. Verdict: no static P:D ratio beats 8C colocation. The binding constraint moves with the ratio (D-pool saturates at 6P+2D/4P+4D, P-pool jams at 2P+6D -> 91% request loss); 8C's shared pool stays elastic at 34% steady, 100% completion. PD wins TPOT (10-35x cleaner, the MB1 phase-isolation benefit is real) but loses TTFT and sheds load. Round-robin P routing also zeroes prefix-cache reuse; a session-affinity re-run of 6P+2D is in flight to test the fix. Figures (rep1): mb5_kv_timeline, mb5_role_split, mb5_peak_utilization, mb5_latency_compare + mb5_summary.csv. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 lines
1.3 KiB
CSV
6 lines
1.3 KiB
CSV
config,rep,n_requests,n_success,wall_clock_s,peak_pool_frac,steady_pool_frac,p_pool_peak_frac,p_pool_steady_frac,d_pool_peak_frac,d_pool_steady_frac,peak_waiting,latency_p50_s,latency_p90_s,latency_p99_s,ttft_p50_s,ttft_p90_s,ttft_p99_s,prefix_cache_hit_ratio
|
|
8C,1,1214,1214,2994.218414353032,0.7174957362137578,0.3439702956225128,,,,,29,10.82550932947197,83.34998885790122,194.10265863158946,6.967104309005663,53.12018221841427,114.12611859919207,0.1937163528742694
|
|
6P+2D,1,1214,1214,3419.065942236979,0.7726478112563957,0.42145750426378625,0.743272692817889,0.3082291074474133,0.9959636156907333,0.7434906196702672,128,44.48975181748392,91.82252187062406,147.70196208347772,40.95952733900049,86.68752026481089,142.84028979733685,0.0
|
|
4P+4D,1,1214,1214,4170.666486939997,0.6997939169982945,0.45876918703808983,0.6438459351904491,0.28540363843092664,0.9753411028993746,0.5977686185332576,152,59.52004547297838,157.08703426021387,224.03997302683115,56.419772224500775,153.07864206891392,219.73412787001706,0.0
|
|
2P+6D,1,1214,109,5761.816568834998,0.9698692438885731,0.9435119386014781,0.9969869243888573,0.9198408186469585,0.9620238772029562,0.9494504453287853,872,26.293884326005355,499.3484142678091,577.7122636228032,23.580788671970367,498.0334587502061,576.5306194114453,0.0
|