Three-axis controlled ablation of PD-colo vs PD-disagg on synthetic regular
traces (closed-loop, controlled reuse via REPLAY_NO_REALIZED_PREFIX) on the
clean stack (e13391e gated off).
Axis 1 (Fig 1) -- reuse 6%->94% at N=8, in8192/out256
Axis 2 (Fig 2) -- shape in2048/out2048 -> in32768/out64 at N=8, reuse~70%
Axis 3 (Fig 3) -- concurrency N=8/16/32/64 at reuse~71%, in8192/out256
Findings:
* APC parity colo=PD at every reuse (5.5/22/44/66/77/82%) -- contamination
fix validated.
* PD edge erodes 1.57x->1.10x with reuse; prefill GPUs strand 26%->9%.
* Shape: PD-best peaks mid-sweep (1.34x at in8192/out512); wrong PD ratio
catastrophic at prefill extreme (in32768/out64 pd2 = 378/400, p99 432s).
* Concurrency: PD wins N<=32 (1.23-1.29x), TIPS at N=64 -- pd2/pd4
crater (APC 71%->1.4%, TPS -30%) while colo scales cleanly.
Infrastructure:
* replayer: --max-inflight-sessions, --inter-turn-think, --no-realized-prefix
(env-defaulted via REPLAY_MAX_INFLIGHT, REPLAY_INTER_TURN_THINK_S,
REPLAY_NO_REALIZED_PREFIX).
* mb5_run.sh: writes bench_config.json + gpu_util.csv + run_window.json +
instance_apc.txt + metrics.jsonl for bench_report/fig_agg ingest.
* fig_agg.py: per-arm GPU role split + producer-side APC; --json mode.
* gpu_util_report.py: companion per-GPU util report from gpu_util.csv.
* partial_summary.py: stats from in-flight replay_metrics.jsonl
(works before metrics.summary.json exists).
Data: analysis/mb5_pd_ablation/fig{1,2,3}.json (24 + 20 + 16 rows).
Figures: figs/mb5_pd_ablation/fig{1_reuse,2_shape,3_concurrency}_axis.png.
2 lines
6.0 KiB
JSON
2 lines
6.0 KiB
JSON
[{"name": "fig3_N16_colo_8C-proxy_rep1", "arm": "colo", "n": 720, "req": 720, "e2e_p50": 2.273988057495444, "e2e_p90": 3.22202166619245, "e2e_p99": 4.154007889915082, "e2e_mean": 2.349281024678405, "ttft_p90": 0.5880337386857718, "tpot_p99": 0.013491011632263985, "tps": 1007.0977376198009, "wall": 183.02096521001658, "pu": 53.47674418604651, "du": null, "apc": 0.7109375}, {"name": "fig3_N16_pd2_2P+6D_rep1", "arm": "2P+6D", "n": 720, "req": 720, "e2e_p50": 1.8571838534990093, "e2e_p90": 2.6877894366974946, "e2e_p99": 5.491437585417586, "e2e_mean": 2.106012088546211, "ttft_p90": 1.098052769700007, "tpot_p99": 0.00736965422303468, "tps": 1083.2982957028535, "wall": 170.14704142999835, "pu": 43.6625, "du": 80.82708333333333, "apc": 0.7109375}, {"name": "fig3_N16_pd4_4P+4D_rep1", "arm": "4P+4D", "n": 720, "req": 720, "e2e_p50": 1.9961085925024236, "e2e_p90": 2.6387570307022545, "e2e_p99": 4.1306983676511875, "e2e_mean": 2.1104672683014645, "ttft_p90": 0.751794431997405, "tpot_p99": 0.008509515762943705, "tps": 1093.4268194900712, "wall": 168.57095208800456, "pu": 19.946202531645568, "du": 98.21518987341773, "apc": 0.7109375}, {"name": "fig3_N16_pd6_6P+2D_rep1", "arm": "6P+2D", "n": 720, "req": 720, "e2e_p50": 3.0437641629832797, "e2e_p90": 3.5130674026062483, "e2e_p99": 5.287108179212371, "e2e_mean": 3.041667996072081, "ttft_p90": 0.7129690356960056, "tpot_p99": 0.013359745218868221, "tps": 871.970575501767, "wall": 211.38327964098426, "pu": 10.943333333333333, "du": 96.86, "apc": 0.7109375}, {"name": "fig3_N32_colo_8C-proxy_rep1", "arm": "colo", "n": 1320, "req": 1320, "e2e_p50": 3.270167972994386, "e2e_p90": 4.661326845278381, "e2e_p99": 6.208903694198525, "e2e_mean": 3.2551948325128417, "ttft_p90": 0.9038233671861248, "tpot_p99": 0.01838023195033048, "tps": 1580.8633971808533, "wall": 213.7566095860093, "pu": 66.3625, "du": null, "apc": 0.7109375}, {"name": "fig3_N32_pd2_2P+6D_rep1", "arm": "2P+6D", "n": 1320, "req": 1320, "e2e_p50": 2.5704439695036854, "e2e_p90": 6.883691897706018, "e2e_p99": 17.488955901044665, "e2e_mean": 3.761722041763344, "ttft_p90": 5.035864923497138, "tpot_p99": 0.010349134326673354, "tps": 1479.6196585494329, "wall": 228.38301589699404, "pu": 56.61682242990654, "du": 82.69626168224299, "apc": 0.7109375}, {"name": "fig3_N32_pd4_4P+4D_rep1", "arm": "4P+4D", "n": 1320, "req": 1320, "e2e_p50": 3.1077146044990513, "e2e_p90": 3.770394044389833, "e2e_p99": 6.062736103993954, "e2e_mean": 3.2164792430455265, "ttft_p90": 1.0083121383970153, "tpot_p99": 0.011962187868884226, "tps": 1608.9998823250762, "wall": 210.01866048100055, "pu": 29.68877551020408, "du": 94.66836734693878, "apc": 0.7109375}, {"name": "fig3_N32_pd6_6P+2D_rep1", "arm": "6P+2D", "n": 1320, "req": 1320, "e2e_p50": 4.650343854504172, "e2e_p90": 5.231803922989639, "e2e_p99": 7.026731992097026, "e2e_mean": 4.642735430796385, "ttft_p90": 0.8066709822014674, "tpot_p99": 0.018365715299701022, "tps": 1244.6646057403061, "wall": 271.4948255470081, "pu": 17.44750656167979, "du": 97.92125984251969, "apc": 0.7109375}, {"name": "fig3_N64_colo_8C-proxy_rep1", "arm": "colo", "n": 2640, "req": 2640, "e2e_p50": 4.616785284990328, "e2e_p90": 6.662268486898392, "e2e_p99": 9.11107949850848, "e2e_mean": 4.8815010888681805, "ttft_p90": 1.4007563413004391, "tpot_p99": 0.028896959955475372, "tps": 2431.5635136762567, "wall": 277.9446213100164, "pu": 80.68076923076923, "du": null, "apc": 0.7109375}, {"name": "fig3_N64_pd2_2P+6D_rep1", "arm": "2P+6D", "n": 2639, "req": 2640, "e2e_p50": 11.69906226900639, "e2e_p90": 31.074856758594986, "e2e_p99": 33.94995162280335, "e2e_mean": 14.142758539245058, "ttft_p90": 29.560715823207286, "tpot_p99": 0.013875843108832534, "tps": 698.1370406736987, "wall": 967.6953959469975, "pu": 43.86363636363637, "du": 45.14781966001478, "apc": 0.4577210235884805}, {"name": "fig3_N64_pd4_4P+4D_rep1", "arm": "4P+4D", "n": 2601, "req": 2640, "e2e_p50": 4.077710573998047, "e2e_p90": 16.441288907983107, "e2e_p99": 385.4163983319886, "e2e_mean": 13.444935590261034, "ttft_p90": 14.423547562997555, "tpot_p99": 0.0182731644510675, "tps": 864.9622710798936, "wall": 769.8092995070037, "pu": 19.74375, "du": 51.03541666666667, "apc": 0.014043434389389917}, {"name": "fig3_N64_pd6_6P+2D_rep1", "arm": "6P+2D", "n": 2640, "req": 2640, "e2e_p50": 7.523401059006574, "e2e_p90": 7.9631150633882495, "e2e_p99": 11.943453508106181, "e2e_mean": 7.532413133775613, "ttft_p90": 0.9084304597956361, "tpot_p99": 0.028826642419567665, "tps": 1712.6176644681352, "wall": 394.62398060099804, "pu": 21.95225225225225, "du": 98.26216216216216, "apc": 0.7109375}, {"name": "fig3_N8_colo_8C-proxy_rep1", "arm": "colo", "n": 360, "req": 360, "e2e_p50": 1.9761100795149105, "e2e_p90": 2.687890137603972, "e2e_p99": 3.689032165001845, "e2e_mean": 2.0329397837324197, "ttft_p90": 0.5798680250212783, "tpot_p99": 0.012779506407645046, "tps": 564.0943793942257, "wall": 163.3769159319927, "pu": 31.852272727272727, "du": null, "apc": 0.7109375}, {"name": "fig3_N8_pd2_2P+6D_rep1", "arm": "2P+6D", "n": 360, "req": 360, "e2e_p50": 1.6348335455040797, "e2e_p90": 2.1373069930952626, "e2e_p99": 3.27345111219359, "e2e_mean": 1.7528110698889374, "ttft_p90": 0.7096388279009263, "tpot_p99": 0.005832891406421778, "tps": 607.517244709636, "wall": 151.699397511009, "pu": 24.97222222222222, "du": 53.789351851851855, "apc": 0.7109375}, {"name": "fig3_N8_pd4_4P+4D_rep1", "arm": "4P+4D", "n": 360, "req": 360, "e2e_p50": 1.6989141649974044, "e2e_p90": 2.1767679123018753, "e2e_p99": 3.204218823980048, "e2e_mean": 1.7972881239612535, "ttft_p90": 0.671729751705425, "tpot_p99": 0.006599093441914785, "tps": 601.1982226696963, "wall": 153.2938663570094, "pu": 13.072916666666666, "du": 68.96527777777777, "apc": 0.7109375}, {"name": "fig3_N8_pd6_6P+2D_rep1", "arm": "6P+2D", "n": 360, "req": 360, "e2e_p50": 2.1166427994903643, "e2e_p90": 2.5305729087849618, "e2e_p99": 3.9926339721458506, "e2e_mean": 2.1701972734549297, "ttft_p90": 0.698665179402451, "tpot_p99": 0.009245334605164794, "tps": 539.9759996392945, "wall": 170.67425230299705, "pu": 7.295833333333333, "du": 92.01875, "apc": 0.7109375}]
|