Single-GPU bench on dash1 GPU 0 (vanilla vLLM 0.18.1, chunked-prefill on,
no kv_connector). 3 decode batch sizes × 5 prefill sizes × 3 reps.
Method recap (driver: microbench/interference/driver.py, repurposed):
- Pin D streaming decode requests at constant max_tokens
- Inject one prefill-only request (max_tokens=1) of varying input length
- Bin decode-stream token timestamps into "during prefill" vs baseline
- Headline metric: effective per-stream TPOT during the prefill burst,
= prefill_ttft / (num_tokens_during_prefill / D). This is the average
rate at which each decode stream produces tokens during the burst.
p50 of inter-token intervals is deceptive (chunked-prefill makes most
intervals look normal); the burst-average gives the true cost.
Results (D=8 row, the most agentic-realistic case):
P (tokens) | prefill_ttft | per-stream TPOT during | penalty
2048 | 143 ms | 32 ms | 4×
8192 | 583 ms | 114 ms | 15×
32768 | 4520 ms | 388 ms | 52×
65536 | 15615 ms | 757 ms | 99×
131072 | 56991 ms | 1419 ms | 183×
Baseline TPOT at D=8: ~7.7 ms. So during a 131k-token prefill burst
each ongoing decode is running ~183× slower (i.e. essentially halted)
for ~57 seconds.
§3.2 implication: PD-disagg's promised phase-isolation benefit per
agentic request is bounded by the decode duration, which is 50–200 ms
for tool-call output. MB2 says the KV-transfer cost of PD-disagg
is 300 ms – 10 s for agentic-size requests. Cost > benefit for every
KV size above ~80 MiB (well below trace mean 192 MiB).
The new figs/pd_cost_vs_benefit.png overlays MB1 benefit ceiling
(50–200 ms band, capped by decode) onto MB2 transfer cost curve and
marks the agentic-distribution waypoints (trace mean, p90, p95, p99)
on the x-axis. Across the entire agentic distribution, the cost curve
sits above the benefit band.
Adds:
- microbench/fresh_setup/mb1_launch.sh: single-GPU vLLM launcher (no
kv_connector, default chunked_prefill=on, max_num_batched_tokens=8192)
- microbench/fresh_setup/mb1_driver.py: copy of the existing
microbench/interference/driver.py for cpfs deployment
- microbench/fresh_setup/analyze_mb1.py: aggregator emitting
per-(D, P) effective-TPOT-during + max PD-disagg-benefit table
- microbench/fresh_setup/plot_mb1.py: mb1 standalone +
pd_cost_vs_benefit headline figure
- analysis/mb1/summary.csv: 45 raw rows from the sweep
- analysis/mb1/breakdown.json: per-(D, P) aggregate
- analysis/mb1/README.md: persistent doc
- figs/mb1_interference.png: effective TPOT during prefill, one line per D
- figs/pd_cost_vs_benefit.png: §3.2 headline (cost > benefit everywhere)
Caveats noted in README:
- chunk_tokens=8192 only; Sarathi-Serve's smaller chunks would
interleave decode more aggressively. Chunk-size sensitivity is
flagged as next run.
- D ≤ 8; higher D may saturate or shrink the penalty further.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6.9 KiB
6.9 KiB
| 1 | chunk_size | decode_batch_size | new_prefill_tokens | repetition | tpot_baseline_p50_ms | tpot_baseline_p90_ms | tpot_during_prefill_p50_ms | tpot_during_prefill_p90_ms | tpot_after_prefill_p50_ms | prefill_ttft_ms | num_tokens_during_prefill | tpot_penalty_p50_ms | tpot_penalty_ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 8192 | 1 | 131072 | 0 | 4.777565016411245 | 4.900234832894057 | 4.701301048044115 | 4.948397364933044 | 0.0 | 56719.25117995124 | 7 | -0.07626396836712956 | 0.9840370632099913 |
| 3 | 8192 | 1 | 131072 | 1 | 4.779465030878782 | 4.883405601140112 | 4.707481013610959 | 4.85471700085327 | 0.0 | 56696.089847013354 | 5 | -0.07198401726782322 | 0.9849388965495606 |
| 4 | 8192 | 1 | 131072 | 2 | 4.790953011251986 | 4.880544205661863 | 4.728371975943446 | 4.907831805758178 | 0.0 | 56880.19039196661 | 5 | -0.06258103530853987 | 0.9869376645603573 |
| 5 | 8192 | 1 | 2048 | 0 | 4.77885699365288 | 4.894876398611814 | 41.434570477576926 | 88.97331730695441 | 0.0 | 183.2046649651602 | 4 | 36.655713483924046 | 8.670393471202205 |
| 6 | 8192 | 1 | 2048 | 1 | 4.788161953911185 | 4.949774022679776 | 41.68213551747613 | 83.5143867880106 | 0.0 | 175.55483896285295 | 4 | 36.89397356356494 | 8.705247633369687 |
| 7 | 8192 | 1 | 2048 | 2 | 4.7893429873511195 | 4.874200583435595 | 23.186982492916286 | 67.25202781381086 | 0.0 | 131.23180496040732 | 4 | 18.397639505565166 | 4.841370215946989 |
| 8 | 8192 | 1 | 32768 | 0 | 4.789774015080184 | 4.870833398308605 | 4.738486022688448 | 4.886626999359578 | 0.0 | 4500.839321000967 | 5 | -0.051287992391735315 | 0.9892921895207875 |
| 9 | 8192 | 1 | 32768 | 1 | 4.776834975928068 | 4.891659819986671 | 4.729953012429178 | 4.9245511763729155 | 0.0 | 4496.073378017172 | 5 | -0.0468819634988904 | 0.9901855593221991 |
| 10 | 8192 | 1 | 32768 | 2 | 4.784431017469615 | 4.866032593417913 | 4.782894975505769 | 4.8977664206177 | 0.0 | 4549.013931944501 | 5 | -0.0015360419638454914 | 0.9996789499193871 |
| 11 | 8192 | 1 | 65536 | 0 | 4.778854956384748 | 4.9255444086156785 | 4.633405013009906 | 4.895579582080245 | 0.0 | 15530.37424501963 | 5 | -0.1454499433748424 | 0.9695638506080803 |
| 12 | 8192 | 1 | 65536 | 1 | 4.784283053595573 | 4.8808404128067195 | 4.754905996378511 | 4.985795798711479 | 0.0 | 15584.887631004676 | 5 | -0.02937705721706152 | 0.99385967408534 |
| 13 | 8192 | 1 | 65536 | 2 | 4.787993966601789 | 4.9004736240021884 | 4.6836750116199255 | 5.0271204963792115 | 0.0 | 15587.390075030271 | 6 | -0.1043189549818635 | 0.9782123879625725 |
| 14 | 8192 | 1 | 8192 | 0 | 4.785028984770179 | 4.878618801012635 | 7.490115996915847 | 324.06569679733366 | 0.0 | 573.2795029762201 | 5 | 2.7050870121456683 | 1.565323014919123 |
| 15 | 8192 | 1 | 8192 | 1 | 4.778591974172741 | 4.899543372448534 | 5.9131429879926145 | 336.8099076091312 | 0.0 | 606.6823820001446 | 5 | 1.1345510138198733 | 1.237423705550061 |
| 16 | 8192 | 1 | 8192 | 2 | 4.78826800826937 | 4.90188361145556 | 6.276679981965572 | 324.8370993998833 | 0.0 | 571.7499859747477 | 5 | 1.488411973696202 | 1.310845585736994 |
| 17 | 8192 | 4 | 131072 | 0 | 6.113810988608748 | 6.309205386787653 | 0.0 | 0.0 | 0.0 | 56702.702289039735 | 0 | -6.113810988608748 | 0.0 |
| 18 | 8192 | 4 | 131072 | 1 | 6.630807969486341 | 7.086459483252838 | 6.2820459716022015 | 4400.500871409893 | 0.0 | 56807.70832300186 | 150 | -0.3487619978841394 | 0.9474027902045915 |
| 19 | 8192 | 4 | 131072 | 2 | 6.073819473385811 | 6.344516028184444 | 6.326125003397465 | 4409.856556192978 | 0.0 | 56580.784838995896 | 149 | 0.2523055300116539 | 1.0415398467335428 |
| 20 | 8192 | 4 | 2048 | 0 | 5.402160517405719 | 5.543816485442221 | 6.210724503034726 | 84.62208869168535 | 6.125201500253752 | 140.3041940066032 | 18 | 0.8085639856290072 | 1.1496741873966574 |
| 21 | 8192 | 4 | 2048 | 1 | 6.067108013667166 | 6.381415005307645 | 0.0 | 0.0 | 0.0 | 140.06177097326145 | 0 | -6.067108013667166 | 0.0 |
| 22 | 8192 | 4 | 2048 | 2 | 5.400336522143334 | 5.536347016459331 | 38.15686801681295 | 85.07051098858938 | 5.25214200024493 | 134.67552902875468 | 13 | 32.756531494669616 | 7.065646346363043 |
| 23 | 8192 | 4 | 32768 | 0 | 6.115561525803059 | 6.369604001520202 | 7.216634490760043 | 1314.6978712815326 | 5.17624247004278 | 4522.433568025008 | 50 | 1.101072964956984 | 1.1800444587649532 |
| 24 | 8192 | 4 | 32768 | 1 | 6.070095987524837 | 6.3612310332246125 | 0.0 | 0.0 | 0.0 | 4508.074064040557 | 0 | -6.070095987524837 | 0.0 |
| 25 | 8192 | 4 | 32768 | 2 | 6.0734800063073635 | 6.312666402664036 | 12.442811043001711 | 1315.0411327951588 | 4.754714027512819 | 4556.892123946454 | 45 | 6.369331036694348 | 2.0487119460473635 |
| 26 | 8192 | 4 | 65536 | 0 | 5.406292999396101 | 5.540905491216108 | 0.0 | 0.0 | 0.0 | 15581.590663990937 | 0 | -5.406292999396101 | 0.0 |
| 27 | 8192 | 4 | 65536 | 1 | 6.076910009142011 | 6.315114628523588 | 0.0 | 0.0 | 0.0 | 15574.196094006766 | 0 | -6.076910009142011 | 0.0 |
| 28 | 8192 | 4 | 65536 | 2 | 6.060379033442587 | 6.384042033459991 | 6.411670008674264 | 2077.4700703914277 | 4.8022730043157935 | 15603.720718005206 | 79 | 0.3512909752316773 | 1.0579651822589267 |
| 29 | 8192 | 4 | 8192 | 0 | 6.110575021011755 | 6.416070973500609 | 8.451583969872445 | 515.3855616226792 | 5.358011490898207 | 574.6672929963097 | 18 | 2.34100894886069 | 1.3831077993169092 |
| 30 | 8192 | 4 | 8192 | 1 | 6.051429023500532 | 6.398122606333345 | 0.0 | 0.0 | 0.0 | 573.6081749782898 | 0 | -6.051429023500532 | 0.0 |
| 31 | 8192 | 4 | 8192 | 2 | 6.064729997888207 | 6.366449000779539 | 0.0 | 0.0 | 0.0 | 574.1707819979638 | 0 | -6.064729997888207 | 0.0 |
| 32 | 8192 | 8 | 131072 | 0 | 7.737616979284212 | 7.99839201499708 | 10.740376019384712 | 4742.438135773409 | 7.792441989295185 | 57010.66731195897 | 335 | 3.0027590401005 | 1.388072845701685 |
| 33 | 8192 | 8 | 131072 | 1 | 7.744895527139306 | 8.013638522243127 | 8.647068490972742 | 5123.228083999129 | 7.672236970392987 | 56970.40947602363 | 310 | 0.9021729638334364 | 1.116486137310966 |
| 34 | 8192 | 8 | 131072 | 2 | 7.740180502878502 | 8.016240986762568 | 15.140031988266855 | 4820.136589207682 | 7.68946303287521 | 56993.02393599646 | 319 | 7.3998514853883535 | 1.9560308680962177 |
| 35 | 8192 | 8 | 2048 | 0 | 7.741285488009453 | 8.022559515666217 | 8.103576023131609 | 124.87094267853536 | 7.6825070136692375 | 141.97922096354887 | 30 | 0.36229053512215614 | 1.046799789993963 |
| 36 | 8192 | 8 | 2048 | 1 | 7.728310010861605 | 8.021069981623441 | 8.17067950265482 | 84.82906777062453 | 7.745136506855488 | 144.1582590341568 | 38 | 0.4423694917932153 | 1.0572401328584768 |
| 37 | 8192 | 8 | 2048 | 2 | 7.662211020942777 | 8.034424972720444 | 8.87883099494502 | 87.23540699575096 | 7.592331967316568 | 143.27958395006135 | 39 | 1.216619974002242 | 1.1587818412566437 |
| 38 | 8192 | 8 | 32768 | 0 | 7.295333489309996 | 7.422819995554164 | 11.429400008637458 | 1315.43214758276 | 7.8034960315562785 | 4523.641717038117 | 94 | 4.134066519327462 | 1.5666727265292526 |
| 39 | 8192 | 8 | 32768 | 1 | 7.278127042809501 | 7.490781514206901 | 12.640403030673042 | 1315.491412486881 | 7.821676495950669 | 4519.993302994408 | 90 | 5.362275987863541 | 1.736765922925357 |
| 40 | 8192 | 8 | 32768 | 2 | 7.684049021918327 | 8.047712198458612 | 10.752685484476388 | 1315.5166705255397 | 7.80402502277866 | 4517.200137954205 | 96 | 3.068636462558061 | 1.3993514947399404 |
| 41 | 8192 | 8 | 65536 | 0 | 7.708174001891166 | 8.017168991500512 | 26.662671996746212 | 2496.8427699001018 | 7.768569514155388 | 15603.601168957539 | 160 | 18.954497994855046 | 3.459012729889679 |
| 42 | 8192 | 8 | 65536 | 1 | 7.594842027174309 | 7.9874323040712625 | 13.054963492322713 | 2459.1690181812737 | 7.54699349636212 | 15620.474929979537 | 174 | 5.460121465148404 | 1.7189249553331216 |
| 43 | 8192 | 8 | 65536 | 2 | 7.693717983784154 | 7.933055714238435 | 17.5579380011186 | 2458.176895044744 | 7.808708498487249 | 15622.32490995666 | 161 | 9.864220017334446 | 2.2821135422594123 |
| 44 | 8192 | 8 | 8192 | 0 | 7.636573514901102 | 7.904737605713308 | 10.151655005756766 | 514.8188057704829 | 7.7977380133233964 | 575.7745200535282 | 37 | 2.515081490855664 | 1.3293468577167538 |
| 45 | 8192 | 8 | 8192 | 1 | 7.687711506150663 | 7.965393498307094 | 9.002390026580542 | 524.0793236298487 | 7.753994490485638 | 592.1044679707848 | 45 | 1.3146785204298794 | 1.1710103870804793 |
| 46 | 8192 | 8 | 8192 | 2 | 7.756220467854291 | 8.035426988499239 | 8.864110975991935 | 518.9726910321042 | 7.770269992761314 | 581.98908099439 | 41 | 1.1078905081376433 | 1.1428389655411813 |