aggregate_mb5.py: - Split the cluster KV timeline by role (P-pool vs D-pool) using a PID->role map parsed from vllm_logs filenames. The cluster average hid the result — 6P+2D/4P+4D look ~45% utilized but the decode pool is actually pegged at ~100% while prefill idles at ~30%. - Two-stage reduce/plot: --reduce-to (numpy-only, runs on the serving host over multi-GB snapshot dirs) dumps a compact JSON; --from-reduced (matplotlib) renders locally. matplotlib import is now lazy. - New plot_role_split figure + p/d peak/steady columns in the CSV. PD_DISAGG_RESULTS.md: consolidated writeup with figures inline. Verdict: no static P:D ratio beats 8C colocation. The binding constraint moves with the ratio (D-pool saturates at 6P+2D/4P+4D, P-pool jams at 2P+6D -> 91% request loss); 8C's shared pool stays elastic at 34% steady, 100% completion. PD wins TPOT (10-35x cleaner, the MB1 phase-isolation benefit is real) but loses TTFT and sheds load. Round-robin P routing also zeroes prefix-cache reuse; a session-affinity re-run of 6P+2D is in flight to test the fix. Figures (rep1): mb5_kv_timeline, mb5_role_split, mb5_peak_utilization, mb5_latency_compare + mb5_summary.csv. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1.3 KiB
1.3 KiB
| 1 | config | rep | n_requests | n_success | wall_clock_s | peak_pool_frac | steady_pool_frac | p_pool_peak_frac | p_pool_steady_frac | d_pool_peak_frac | d_pool_steady_frac | peak_waiting | latency_p50_s | latency_p90_s | latency_p99_s | ttft_p50_s | ttft_p90_s | ttft_p99_s | prefix_cache_hit_ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 8C | 1 | 1214 | 1214 | 2994.218414353032 | 0.7174957362137578 | 0.3439702956225128 | 29 | 10.82550932947197 | 83.34998885790122 | 194.10265863158946 | 6.967104309005663 | 53.12018221841427 | 114.12611859919207 | 0.1937163528742694 | ||||
| 3 | 6P+2D | 1 | 1214 | 1214 | 3419.065942236979 | 0.7726478112563957 | 0.42145750426378625 | 0.743272692817889 | 0.3082291074474133 | 0.9959636156907333 | 0.7434906196702672 | 128 | 44.48975181748392 | 91.82252187062406 | 147.70196208347772 | 40.95952733900049 | 86.68752026481089 | 142.84028979733685 | 0.0 |
| 4 | 4P+4D | 1 | 1214 | 1214 | 4170.666486939997 | 0.6997939169982945 | 0.45876918703808983 | 0.6438459351904491 | 0.28540363843092664 | 0.9753411028993746 | 0.5977686185332576 | 152 | 59.52004547297838 | 157.08703426021387 | 224.03997302683115 | 56.419772224500775 | 153.07864206891392 | 219.73412787001706 | 0.0 |
| 5 | 2P+6D | 1 | 1214 | 109 | 5761.816568834998 | 0.9698692438885731 | 0.9435119386014781 | 0.9969869243888573 | 0.9198408186469585 | 0.9620238772029562 | 0.9494504453287853 | 872 | 26.293884326005355 | 499.3484142678091 | 577.7122636228032 | 23.580788671970367 | 498.0334587502061 | 576.5306194114453 | 0.0 |