Follow-up to the LMetric sweep: rerun with --policy linear (cache-aware load + sticky session affinity, the cache_aware_proxy default) and cap each PD-disagg arm at 2x the colo bench wall (SIGTERM bench.sh once cap is exceeded; the cleanup trap clears vLLM and proxy; capped runs lack metrics.summary.json so the analysis script computes from raw metrics.jsonl). Headline: the success-rate ceiling is policy-invariant. arm linear (capped at 2x) lmetric (uncapped) colo 807/807 = 100%, 964s 807/807 = 100%, 1021s pd6 (6:2) 472/807 = 58%, 2280s ⊗ 474/807 = 59%, 3325s pd4 (4:4) 349/807 = 43%, 2281s ⊗ 348/807 = 43%, 6850s pd2 (2:6) 176/807 = 22%, 2280s ⊗ 180/807 = 22%, 19275s Routing affects only how much wall is wasted timing out unreachable requests at 600s each. Linear hits the same ceiling in 2280s as LMetric does in 3300-19000s. This *strengthens* the §5 D-pool capacity-ceiling thesis -- the cap is structural, not a routing artifact. Artifacts: analysis/v2/fig4r_linear.json -- 4-arm linear summary analysis/v2/PD_DISAGG_LMETRIC.md -- extended with wall-cap section figs/v2/fig4_linear_vs_lmetric.png -- 3-panel side-by-side comparison microbench/fresh_setup/plot_fig4_linear_vs_lmetric.py
2 lines
2.3 KiB
JSON
2 lines
2.3 KiB
JSON
[{"tag": "fig4r_linear_colo_first600s", "arm": "colo", "trace": "first600s", "policy": "linear", "n": 807, "req": 807, "dispatched": 807, "e2e": {"count": 807.0, "mean": 8.436370009274967, "p50": 2.5224755640374497, "p90": 22.65510415879542, "p99": 75.54369598095519}, "ttft": {"count": 807.0, "mean": 4.2332503390957195, "p50": 0.8872958200518042, "p90": 11.684667797433207, "p99": 44.98891795879462}, "tpot": {"count": 807.0, "mean": 0.020958194728517718, "p50": 0.00851320761584622, "p90": 0.026440129078245465, "p99": 0.30344440533287176}, "wall": 963.6191155100241, "tps": 239.4857016486815, "capped": false}, {"tag": "fig4r_linear_pd2_first600s", "arm": "2P+6D", "trace": "first600s", "policy": "linear", "n": 176, "req": 807, "dispatched": 521, "e2e": {"count": 176, "mean": 378.5561210460834, "p50": 536.7719694490079, "p90": 583.832092280034, "p99": 601.3415494390065}, "ttft": {"count": 176, "mean": 377.12570991374446, "p50": 536.1157373189926, "p90": 580.3465002350276, "p99": 598.0943597999867}, "tpot": {"count": 176, "mean": 0.007864906140929698, "p50": 0.007212154543958604, "p90": 0.011962352272927423, "p99": 0.017870794738764347}, "wall": 2280, "tps": 14.419736842105262, "capped": true}, {"tag": "fig4r_linear_pd4_first600s", "arm": "4P+4D", "trace": "first600s", "policy": "linear", "n": 349, "req": 807, "dispatched": 577, "e2e": {"count": 349, "mean": 264.8537863784421, "p50": 306.6853819829412, "p90": 488.64622142596636, "p99": 596.5830293919425}, "ttft": {"count": 349, "mean": 262.3163347712099, "p50": 299.75751709297765, "p90": 485.475125996978, "p99": 596.4081599479541}, "tpot": {"count": 349, "mean": 0.010442244895290958, "p50": 0.008213572105774598, "p90": 0.019443845545703716, "p99": 0.028178529054794}, "wall": 2281, "tps": 38.306882946076286, "capped": true}, {"tag": "fig4r_linear_pd6_first600s", "arm": "6P+2D", "trace": "first600s", "policy": "linear", "n": 472, "req": 807, "dispatched": 706, "e2e": {"count": 472, "mean": 118.632779156234, "p50": 12.702161715948023, "p90": 458.1609142010566, "p99": 526.5488834320568}, "ttft": {"count": 472, "mean": 115.80202843308507, "p50": 9.745031949947588, "p90": 455.81679951993283, "p99": 516.5850186559837}, "tpot": {"count": 472, "mean": 0.00950947083585719, "p50": 0.008435572332624966, "p90": 0.015233499645638644, "p99": 0.023447183093280886}, "wall": 2280, "tps": 61.69210526315789, "capped": true}]
|