Anchor experiment for the clean-stack PD comparison using the canonical
cache-aware proxy with --policy lmetric (scripts/bench.sh harness). Two
traces x four arms = eight runs on dash1.
Headline: with the right routing baseline (LMetric), PD-colo holds 100%
completion on both traces while every static PD-disagg ratio fails
(14-65% completion), and the failure mode rotates with the split --
no static partition has a working operating point on this workload.
LMetric improves colo dramatically (TTFT p50 1.0s vs original §3 RR
7.0s; 7x) but does NOT rescue PD-disagg, confirming the bottleneck is
structural (D-pool admission + multi-turn KV accumulation), not routing.
Completion matrix:
first600s full
colo 100% 100%
pd6 (6:2) 58.7% 65.3% (decode-bound)
pd4 (4:4) 43.1% 43.9% (both bottlenecks)
pd2 (2:6) 22.3% 13.9% (prefill-bound)
The original §3 RR "100% PD completion" appears to be a measurement
artifact of e13391e: producer-KV eviction acted as a relief valve,
letting more requests squeeze under the 600s timeout at the (uncosted)
price of cross-turn re-prefill. With the eviction off, PD-disagg is
worse than §3 advertised, not better.
Artifacts:
analysis/v2/fig4l_lmetric.json -- 8-arm summary data
analysis/v2/PD_DISAGG_LMETRIC.md -- writeup + reproduce recipe
figs/v2/fig4_lmetric_pd_vs_colo.png -- 4-panel comparison figure
microbench/fresh_setup/plot_fig4l_lmetric.py -- plot script
2 lines
4.3 KiB
JSON
2 lines
4.3 KiB
JSON
[{"tag": "fig4l_lmetric_colo_first600s", "arm": "colo", "trace": "first600s", "n": 807, "req": 807, "e2e": {"count": 807.0, "mean": 11.066699584425269, "p50": 3.27055042097345, "p90": 28.745733462180937, "p99": 97.40008939541167}, "ttft": {"count": 807.0, "mean": 5.119651803458883, "p50": 1.2114678020589054, "p90": 14.777630288852365, "p99": 50.68302261995841}, "tpot": {"count": 807.0, "mean": 0.03004899278845205, "p50": 0.009643197803618922, "p90": 0.042092699501536976, "p99": 0.3919741264067197}, "wall": 1020.5351374909515, "tps": 226.12940164644368}, {"tag": "fig4l_lmetric_colo_full", "arm": "colo", "trace": "full", "n": 1214, "req": 1214, "e2e": {"count": 1214.0, "mean": 10.928977524270508, "p50": 3.1279119075043127, "p90": 30.011970606888667, "p99": 94.77313101590481}, "ttft": {"count": 1214.0, "mean": 5.533819193267678, "p50": 1.017395684029907, "p90": 17.36427243486981, "p99": 51.49416554694993}, "tpot": {"count": 1214.0, "mean": 0.02049970290344434, "p50": 0.009544484575988867, "p90": 0.032480608771520716, "p99": 0.26057810739537074}, "wall": 2993.276069591986, "tps": 125.38402448497122}, {"tag": "fig4l_lmetric_pd2_first600s", "arm": "2P+6D", "trace": "first600s", "n": 180, "req": 807, "e2e": {"count": 180.0, "mean": 380.2505690135715, "p50": 535.6594606440049, "p90": 579.5011055286858, "p99": 601.5567972306756}, "ttft": {"count": 180.0, "mean": 378.7133691522933, "p50": 534.4269686369807, "p90": 577.3534130641376, "p99": 596.404559875431}, "tpot": {"count": 180.0, "mean": 0.007975266077679418, "p50": 0.007166497974743372, "p90": 0.012511071875514153, "p99": 0.017508981961061446}, "wall": 19275.367093455978, "tps": 1.8895100582735462}, {"tag": "fig4l_lmetric_pd2_full", "arm": "2P+6D", "trace": "full", "n": 169, "req": 1214, "e2e": {"count": 169.0, "mean": 194.88523891245458, "p50": 6.817620265996084, "p90": 552.1569225640735, "p99": 595.3934216396092}, "ttft": {"count": 169.0, "mean": 193.4153314989016, "p50": 5.60239192598965, "p90": 549.3611521873856, "p99": 582.4436428000824}, "tpot": {"count": 169.0, "mean": 0.007747395842651413, "p50": 0.007691574401794991, "p90": 0.011201243427351017, "p99": 0.013311375577245894}, "wall": 33770.57413210906, "tps": 0.9869539045920406}, {"tag": "fig4l_lmetric_pd4_first600s", "arm": "4P+4D", "trace": "first600s", "n": 348, "req": 807, "e2e": {"count": 348.0, "mean": 202.63302869595395, "p50": 214.03008900902933, "p90": 477.40967412578175, "p99": 576.6393926549597}, "ttft": {"count": 348.0, "mean": 199.96385804087797, "p50": 213.50966987549327, "p90": 475.7766476540827, "p99": 559.6153268160638}, "tpot": {"count": 348.0, "mean": 0.008873619369764751, "p50": 0.007645836479973812, "p90": 0.013845969236959285, "p99": 0.02567216653158788}, "wall": 6850.181333696004, "tps": 15.00296050477674}, {"tag": "fig4l_lmetric_pd4_full", "arm": "4P+4D", "trace": "full", "n": 533, "req": 1214, "e2e": {"count": 533.0, "mean": 130.94711188977982, "p50": 8.219856544979848, "p90": 473.44134307731883, "p99": 533.2597587251009}, "ttft": {"count": 533.0, "mean": 127.83193208824007, "p50": 4.8246813879814, "p90": 467.54664219671395, "p99": 528.8304683346115}, "tpot": {"count": 533.0, "mean": 0.008886429490232585, "p50": 0.007981476340708988, "p90": 0.013570741891233497, "p99": 0.023050950961825044}, "wall": 13884.384965199977, "tps": 12.621372890425038}, {"tag": "fig4l_lmetric_pd6_first600s", "arm": "6P+2D", "trace": "first600s", "n": 474, "req": 807, "e2e": {"count": 474.0, "mean": 83.15809065495806, "p50": 6.7270191764691845, "p90": 391.6558471220078, "p99": 544.7372293809171}, "ttft": {"count": 474.0, "mean": 80.70155321074382, "p50": 4.1273433425230905, "p90": 390.00296151017517, "p99": 539.0574236416071}, "tpot": {"count": 474.0, "mean": 0.008519881756330928, "p50": 0.00803907146806204, "p90": 0.012583933303093976, "p99": 0.018606097790947705}, "wall": 3325.2749515309697, "tps": 39.705588838364164}, {"tag": "fig4l_lmetric_pd6_full", "arm": "6P+2D", "trace": "full", "n": 793, "req": 1214, "e2e": {"count": 793.0, "mean": 61.907526705667, "p50": 3.69814173609484, "p90": 308.2633092067672, "p99": 477.48038318102715}, "ttft": {"count": 793.0, "mean": 59.25069201986225, "p50": 1.402295546955429, "p90": 302.5604081378088, "p99": 475.3738951798529}, "tpot": {"count": 793.0, "mean": 0.009137289999448822, "p50": 0.008635683270933276, "p90": 0.013065757584108427, "p99": 0.01816783740464599}, "wall": 5662.029295974993, "tps": 39.24494000021532}]
|