diff --git a/analysis/mb5_pd_ablation/README.md b/analysis/mb5_pd_ablation/README.md new file mode 100644 index 0000000..96abab9 --- /dev/null +++ b/analysis/mb5_pd_ablation/README.md @@ -0,0 +1,107 @@ +# PD-disagg vs colocation — controlled reuse & concurrency axes (v2) + +Self-contained results for the **controlled-variable** redo of the MB5 PD-vs-colo +ablation. Supersedes the confounded first cut (held input fixed and sliced the +prefix, so "more reuse" was entangled with "less prefill"). All arms route through +the proxy at fair **APC parity** (session-routed producers reach the same prefix-cache +hit rate as colo), so PD loses on *structure*, not on broken cache. + +- **Config arms:** `colo` = 8×kv_both (8C-proxy, session-affinity); PD = `6P+2D / 4P+4D / 2P+6D`. +- **Driver:** closed-loop N (`REPLAY_MAX_INFLIGHT`) + fixed think-time; `gen_synthetic_trace.py --mode regular`. +- **PD-arm wall-cap:** collapsed PD arms drain pathologically slowly, so PD arms run with a + wall-deadline (`REPLAY_MAX_DURATION`; un-run turns counted as failures → honest completion%); + **colo is uncapped** so the reference is always fully measured. +- **Hardware:** run on **dash2** (8×H20). dash0's RDMA NICs were faulted for Mooncake during + this work (could not init the transfer engine; needs an admin reset — no sudo); dash2's NICs + are healthy. cpfs/venv/data are shared across the boxes. + +--- + +## 1. Reuse / APC axis — fixed real prefill, vary cached prefix + +N=8. Hold the **real new-prefill work per turn constant** (`--delta-len`) and grow the cached +prefix → reuse = prefix/(prefix+delta). Three shapes isolate output vs delta: + +| | delta (real prefill/turn) | output | role | +|---|---|---|---| +| **A** | 2048 | 256 | original | +| **C** | 2048 | 128 | A vs C = pure **output** 256→128 | +| **B** | 1024 | 128 | C vs B = pure **delta** 2048→1024 | + +**PD-best advantage** = colo E2E-p90 / best-PD E2E-p90 (>1 ⇒ PD wins): + +| reuse% | A d2048/o256 | C d2048/o128 | B d1024/o128 | +|---|---|---|---| +| 20 | 1.34 | 1.41 | — | +| 50 | 1.36 | 1.37 | — | +| 67 | **1.47** | **1.49** | **1.27** | +| 80 | 1.31 | 1.23 | 1.25 | +| 90 | 1.15 | 1.01 | — | +| 95 | 0.87 | 0.89 | 0.89 | + +![reuse 3-way](../../figs/mb5_pd_ablation/reuse_compare_ABC.png) + +**Findings:** +1. **Output length is ~negligible.** A and C (same delta) track each other across the whole + range — halving output barely moves PD's advantage. +2. **Delta (real prefill/turn) is the dominant shape factor.** B (delta=1024) sits clearly + below A/C at mid reuse (67%: 1.27 vs ~1.48). More real prefill per turn → bigger PD win, + because PD-disagg's benefit is isolating prefill from decode — more prefill to isolate. +3. **Crossover to colo at reuse ~90–95% is robust** across all three shapes: PD always loses + the high-reuse / large-resident-context corner (it must KV-transfer the whole resident + context every turn for a few hundred new tokens; colo keeps it local). + +*Caveat:* the clean, uncapped, 100%-completion comparison region is reuse **20–80%** (carries +findings 1–2). At reuse 90/95% the PD arms collapse and C's points are capped-completion, while +A/B are full-drain — comparable in direction, not in exact PD completion%. + +Data: `fig1_reuse_fixed.json` (A), `fig1_reuse_d2048_o128.json` (C), `fig1_reuse_d1024_o128.json` (B). + +--- + +## 2. Concurrency axis — agentic corner, sweep N + +in=32768 (prefix 32256 + delta 512, **reuse 0.984**), out=128; closed-loop N ∈ {8,16,32,48,64,96,128}; +PD arms capped 600s, colo uncapped. + +| N | **colo** completion · E2E-mean · TPS | best PD-arm completion | +|---|---|---| +| 8 | **256/256** · 2.4s · 326 | 6P+2D 256/256 | +| 16 | **512/512** · 3.5s · 462 | 6P+2D 439/512 (86%) | +| 32 | **1024/1024** · 13.3s · 190 | all PD **<27%** | +| 48 | **1536/1536** · 24.9s · 168 | all PD <32% | +| 64 | **2048/2048** · 38.4s · 166 | all PD <31% | +| 96 | **3072/3072** · 60.0s · 171 | PD **2–7%** | +| 128 | **4096/4096** · 80.8s · 181 | 4P+4D 6%, 2P+6D <1% | + +![concurrency](../../figs/mb5_pd_ablation/fig3_concurrency_axis.png) + +**Finding:** **colo completes 100% of requests at every concurrency level** — it degrades +*gracefully* (latency rises 2.4s→81s, nothing dropped). **Every static PD split collapses, and +progressively earlier as N rises**: PD is viable only at N≤8–16; by N≥32 it drops 70–99% of +requests while its prefix-cache hit-rate craters to ~0%. colo's elastic pool absorbs the +time-varying P/D demand; the static partition + per-turn 32k KV-transfer cannot. (Latency +percentiles count successes only, so they *understate* PD — read them with the completion column.) + +Data: `fig3_conc32k.json`. + +*Caveat:* N=128 6P+2D is missing (one transient vLLM/Mooncake startup flake at the end); does +not change the picture (all PD arms are already collapsed by N=128). The SLO auto-stop in the +driver is a no-op (a stdout-capture bug), so the full grid ran — more points, not fewer. + +--- + +## 3. Reproduce + +```bash +# on a box with healthy Mooncake/RDMA NICs (dash2), cpfs mounted: +R=/home/admin/cpfs/wjh/agentic-kv-fresh +# reuse axis (three shapes): DELTA/OL pick the shape; tag carries _o${OL} +ssh dash2 "cd $R && DELTA=2048 OL=256 bash microbench/fresh_setup/run_reuse_fixed.sh" +ssh dash2 "cd $R && DELTA=2048 OL=128 bash microbench/fresh_setup/run_reuse_fixed.sh" +ssh dash2 "cd $R && DELTA=1024 OL=128 bash microbench/fresh_setup/run_reuse_fixed.sh" +# concurrency axis (capped): +ssh dash2 "cd $R && NLIST='8 16 32 48 64 96 128' CONC_PD_MAXDUR=600 bash microbench/fresh_setup/run_conc.sh" +# render (reads the *.json in this dir): +python microbench/fresh_setup/plot_pd_crossover.py +``` diff --git a/analysis/mb5_pd_ablation/fig3_conc32k.json b/analysis/mb5_pd_ablation/fig3_conc32k.json new file mode 100644 index 0000000..1de3ca3 --- /dev/null +++ b/analysis/mb5_pd_ablation/fig3_conc32k.json @@ -0,0 +1 @@ +[{"name": "conc32k_N128_2P+6D_rep1", "arm": "2P+6D", "n": 32, "req": 4096, "e2e_p50": 514.259894067538, "e2e_p90": 568.6411112621427, "e2e_p99": 580.8886191064375, "e2e_mean": 452.58676574456695, "ttft_p90": 567.7902540716576, "tpot_p99": 0.009655409648814643, "tps": 6.825781772920619, "wall": 600.0777839469956, "pu": 19.680851063829788, "du": 1.00354609929078, "apc": 0.0}, {"name": "conc32k_N128_4P+4D_rep1", "arm": "4P+4D", "n": 235, "req": 4096, "e2e_p50": 135.7035841710167, "e2e_p90": 471.346485194657, "e2e_p99": 532.970156197506, "e2e_mean": 240.51798818460318, "ttft_p90": 470.4102183676557, "tpot_p99": 0.009764630051172066, "tps": 50.1298101847112, "wall": 600.042168305954, "pu": 46.15658362989324, "du": 7.930604982206406, "apc": 0.01922607421875}, {"name": "conc32k_N128_8C-proxy_rep1", "arm": "colo", "n": 4096, "req": 4096, "e2e_p50": 93.11914577853167, "e2e_p90": 140.26620708603878, "e2e_p99": 159.15196072253164, "e2e_mean": 80.84483524090638, "ttft_p90": 104.72363437898457, "tpot_p99": 0.2904040704634386, "tps": 180.8413276471139, "wall": 2899.160312641994, "pu": 71.35606060606061, "du": null, "apc": 0.22432943976802242}, {"name": "conc32k_N16_2P+6D_rep1", "arm": "2P+6D", "n": 270, "req": 512, "e2e_p50": 16.88699725206243, "e2e_p90": 71.16618712131167, "e2e_p99": 72.94499705124298, "e2e_mean": 33.01289418954877, "ttft_p90": 70.31587210793514, "tpot_p99": 0.009110894146270858, "tps": 57.59736818329287, "wall": 600.027416010038, "pu": 59.895390070921984, "du": 6.963356973995272, "apc": 0.4347946113074205}, {"name": "conc32k_N16_4P+4D_rep1", "arm": "4P+4D", "n": 342, "req": 512, "e2e_p50": 1.6035146280191839, "e2e_p90": 23.060664943722102, "e2e_p99": 456.527012144829, "e2e_mean": 25.18196889393376, "ttft_p90": 22.03004272416003, "tpot_p99": 0.01195068457993444, "tps": 72.95691299529491, "wall": 600.0253876260249, "pu": 13.563167259786477, "du": 12.34608540925267, "apc": 0.7985404125354107}, {"name": "conc32k_N16_6P+2D_rep1", "arm": "6P+2D", "n": 439, "req": 512, "e2e_p50": 3.5824058459838852, "e2e_p90": 9.515727914776651, "e2e_p99": 454.5892039508978, "e2e_mean": 14.37662458394024, "ttft_p90": 8.290924134664236, "tpot_p99": 0.025736230430541573, "tps": 93.64927236343019, "wall": 600.0260181620251, "pu": 10.203309692671395, "du": 27.79255319148936, "apc": 0.5737022957067341}, {"name": "conc32k_N16_8C-proxy_rep1", "arm": "colo", "n": 512, "req": 512, "e2e_p50": 1.9904886874137446, "e2e_p90": 7.553310088452418, "e2e_p99": 20.79787442037952, "e2e_mean": 3.548424774579871, "ttft_p90": 4.572699708072468, "tpot_p99": 0.08979923607100454, "tps": 461.8961028774644, "wall": 141.88472167600412, "pu": 59.110074626865675, "du": null, "apc": 0.861328125}, {"name": "conc32k_N32_2P+6D_rep1", "arm": "2P+6D", "n": 266, "req": 1024, "e2e_p50": 67.7868700629333, "e2e_p90": 75.71576275100233, "e2e_p99": 89.38174203918904, "e2e_mean": 67.24167772074128, "ttft_p90": 74.86772483045934, "tpot_p99": 0.006747005557216058, "tps": 56.74409952681553, "wall": 600.0271443889942, "pu": 99.64539007092199, "du": 6.6040189125295505, "apc": 0.0}, {"name": "conc32k_N32_4P+4D_rep1", "arm": "4P+4D", "n": 158, "req": 1024, "e2e_p50": 42.75656270503532, "e2e_p90": 56.365707339381345, "e2e_p99": 113.03851705471868, "e2e_mean": 36.28278346921397, "ttft_p90": 55.51712800206152, "tpot_p99": 0.009327373837824823, "tps": 33.70412646428578, "wall": 600.0452206180198, "pu": 22.92232142857143, "du": 5.202678571428572, "apc": 0.47630828067951025}, {"name": "conc32k_N32_6P+2D_rep1", "arm": "6P+2D", "n": 207, "req": 1024, "e2e_p50": 16.343308847979642, "e2e_p90": 391.5856816125801, "e2e_p99": 461.5894148506666, "e2e_mean": 63.58167548530065, "ttft_p90": 390.7401910103858, "tpot_p99": 0.022317821956789635, "tps": 44.15799184960015, "wall": 600.027285892982, "pu": 9.313167259786477, "du": 12.209964412811388, "apc": 0.5886122604675192}, {"name": "conc32k_N32_8C-proxy_rep1", "arm": "colo", "n": 1024, "req": 1024, "e2e_p50": 3.112916806479916, "e2e_p90": 56.95651195792017, "e2e_p99": 100.27527027199393, "e2e_mean": 13.337265807642666, "ttft_p90": 22.546092133712957, "tpot_p99": 0.2838215297627387, "tps": 189.99521306562647, "wall": 689.870012434083, "pu": 32.91743827160494, "du": null, "apc": 0.7271785835597826}, {"name": "conc32k_N48_2P+6D_rep1", "arm": "2P+6D", "n": 265, "req": 1536, "e2e_p50": 102.76888889505062, "e2e_p90": 113.42054960383102, "e2e_p99": 114.77465962313582, "e2e_mean": 98.40494008093, "ttft_p90": 112.57273413538933, "tpot_p99": 0.009059556105731453, "tps": 56.530743141043914, "wall": 600.0274915079353, "pu": 99.29078014184397, "du": 6.3132387706855795, "apc": 0.0}, {"name": "conc32k_N48_4P+4D_rep1", "arm": "4P+4D", "n": 487, "req": 1536, "e2e_p50": 61.71183440799359, "e2e_p90": 75.25955151061062, "e2e_p99": 96.9825204003416, "e2e_mean": 52.89479321273985, "ttft_p90": 74.39665991906077, "tpot_p99": 0.010239832782109569, "tps": 103.88828114198009, "wall": 600.0291786020389, "pu": 80.13434163701068, "du": 17.176156583629894, "apc": 0.1690824917116896}, {"name": "conc32k_N48_6P+2D_rep1", "arm": "6P+2D", "n": 407, "req": 1536, "e2e_p50": 43.92707859107759, "e2e_p90": 59.944055985216984, "e2e_p99": 112.62553774632742, "e2e_mean": 42.093486530729166, "ttft_p90": 59.08382808978203, "tpot_p99": 0.01777347762958975, "tps": 86.82154292440741, "wall": 600.035408785101, "pu": 32.310201660735466, "du": 28.923487544483987, "apc": 0.002410323472803945}, {"name": "conc32k_N48_8C-proxy_rep1", "arm": "colo", "n": 1536, "req": 1536, "e2e_p50": 4.488386315992102, "e2e_p90": 103.49955359648447, "e2e_p99": 149.24535391716637, "e2e_mean": 24.882035963421306, "ttft_p90": 68.04615733394166, "tpot_p99": 0.2868353361025543, "tps": 167.6337216646859, "wall": 1172.842779171071, "pu": 33.92440801457195, "du": null, "apc": 0.6802594597440629}, {"name": "conc32k_N64_2P+6D_rep1", "arm": "2P+6D", "n": 265, "req": 2048, "e2e_p50": 141.83604138100054, "e2e_p90": 145.0251924246084, "e2e_p99": 146.31120821716263, "e2e_mean": 127.1679677689593, "ttft_p90": 144.17453655162825, "tpot_p99": 0.007076157499618067, "tps": 56.53043222301936, "wall": 600.0307916660095, "pu": 99.28825622775801, "du": 6.48220640569395, "apc": 0.0}, {"name": "conc32k_N64_4P+4D_rep1", "arm": "4P+4D", "n": 503, "req": 2048, "e2e_p50": 71.65615673502907, "e2e_p90": 88.91418252603617, "e2e_p99": 111.395891777589, "e2e_mean": 69.88181402112305, "ttft_p90": 88.00009718760847, "tpot_p99": 0.009134947498783644, "tps": 107.30060072388244, "wall": 600.0339193410473, "pu": 93.92857142857143, "du": 18.433035714285715, "apc": 0.017169331395348836}, {"name": "conc32k_N64_6P+2D_rep1", "arm": "6P+2D", "n": 640, "req": 2048, "e2e_p50": 50.54000817600172, "e2e_p90": 80.58664564599749, "e2e_p99": 119.38929161281675, "e2e_mean": 54.368219223681805, "ttft_p90": 79.69296148774447, "tpot_p99": 0.011780964897039871, "tps": 136.5256755990702, "wall": 600.0336540400749, "pu": 61.211743772241995, "du": 41.670818505338076, "apc": 0.5339694106787966}, {"name": "conc32k_N64_8C-proxy_rep1", "arm": "colo", "n": 2048, "req": 2048, "e2e_p50": 7.880336284462828, "e2e_p90": 99.34212617237353, "e2e_p99": 154.2842405023507, "e2e_mean": 38.38007544943264, "ttft_p90": 64.50713637162698, "tpot_p99": 0.28651403933409675, "tps": 166.4444099684129, "wall": 1574.9642781619914, "pu": 47.90200407608695, "du": null, "apc": 0.5034716568788178}, {"name": "conc32k_N8_2P+6D_rep1", "arm": "2P+6D", "n": 166, "req": 256, "e2e_p50": 3.161887424532324, "e2e_p90": 8.796273799962364, "e2e_p99": 19.35086066399235, "e2e_mean": 4.329762689510489, "ttft_p90": 7.939646945509594, "tpot_p99": 0.00889104103374361, "tps": 35.4116019325134, "wall": 600.0293361620279, "pu": 13.129432624113475, "du": 4.155437352245863, "apc": 0.8383327503109452}, {"name": "conc32k_N8_4P+4D_rep1", "arm": "4P+4D", "n": 212, "req": 256, "e2e_p50": 1.3527575409971178, "e2e_p90": 8.36480576953618, "e2e_p99": 12.45318060620571, "e2e_mean": 3.1336761365561165, "ttft_p90": 7.346757155098023, "tpot_p99": 0.011452607204523447, "tps": 68.2172500099223, "wall": 397.787949471036, "pu": 10.775401069518717, "du": 12.131016042780749, "apc": 0.861328125}, {"name": "conc32k_N8_6P+2D_rep1", "arm": "6P+2D", "n": 256, "req": 256, "e2e_p50": 1.6609621004317887, "e2e_p90": 6.754639569961, "e2e_p99": 13.541372917423683, "e2e_mean": 3.0033252389107474, "ttft_p90": 5.267110070504714, "tpot_p99": 0.020384013706150894, "tps": 272.4142479365925, "wall": 120.28739409998525, "pu": 23.584795321637426, "du": 66.5701754385965, "apc": 0.861328125}, {"name": "conc32k_N8_8C-proxy_rep1", "arm": "colo", "n": 256, "req": 256, "e2e_p50": 1.5159486784832552, "e2e_p90": 5.598836069984827, "e2e_p99": 11.732619991357206, "e2e_mean": 2.38795049598275, "ttft_p90": 4.561962092004251, "tpot_p99": 0.05528094475262972, "tps": 326.4167482129997, "wall": 100.38700581202284, "pu": 45.268229166666664, "du": null, "apc": 0.861328125}, {"name": "conc32k_N96_2P+6D_rep1", "arm": "2P+6D", "n": 63, "req": 3072, "e2e_p50": 132.1833890659036, "e2e_p90": 207.02887548061554, "e2e_p99": 502.58203177167337, "e2e_mean": 154.06696159096384, "ttft_p90": 206.1777482896112, "tpot_p99": 0.006724634236384799, "tps": 13.438881724737026, "wall": 600.0499271570006, "pu": 30.48049645390071, "du": 1.7287234042553192, "apc": 0.0}, {"name": "conc32k_N96_4P+4D_rep1", "arm": "4P+4D", "n": 144, "req": 3072, "e2e_p50": 107.3917985354783, "e2e_p90": 492.41011438380235, "e2e_p99": 524.7689861402963, "e2e_mean": 197.25430914041902, "ttft_p90": 491.56264113695363, "tpot_p99": 0.009930134840029114, "tps": 30.71556961557379, "wall": 600.0865434269654, "pu": 34.68594306049822, "du": 5.641459074733096, "apc": 0.03815818319515306}, {"name": "conc32k_N96_6P+2D_rep1", "arm": "6P+2D", "n": 204, "req": 3072, "e2e_p50": 75.94057278643595, "e2e_p90": 499.43273953085304, "e2e_p99": 547.6024436797773, "e2e_mean": 217.41871207330877, "ttft_p90": 498.3524705420947, "tpot_p99": 0.017267738996700595, "tps": 43.51722686849308, "wall": 600.0382349479478, "pu": 30.847568208778174, "du": 13.325622775800712, "apc": 0.0}, {"name": "conc32k_N96_8C-proxy_rep1", "arm": "colo", "n": 3072, "req": 3072, "e2e_p50": 68.16499787406065, "e2e_p90": 123.85599779536714, "e2e_p99": 153.7248946602491, "e2e_mean": 60.02239005504138, "ttft_p90": 88.56828404997941, "tpot_p99": 0.2902358833879533, "tps": 171.3173598136646, "wall": 2295.249007033999, "pu": 60.944327731092436, "du": null, "apc": 0.3287494643970084}] diff --git a/figs/mb5_pd_ablation/fig3_concurrency_axis.png b/figs/mb5_pd_ablation/fig3_concurrency_axis.png index 97dcb69..37c1da1 100644 Binary files a/figs/mb5_pd_ablation/fig3_concurrency_axis.png and b/figs/mb5_pd_ablation/fig3_concurrency_axis.png differ