Files
agentic-kvc/analysis/mb1/summary.csv
Gahow Wang 029821c1b6 MB1: prefill-decode interference under chunked-prefill default; §3.2 headline
Single-GPU bench on dash1 GPU 0 (vanilla vLLM 0.18.1, chunked-prefill on,
no kv_connector). 3 decode batch sizes × 5 prefill sizes × 3 reps.

Method recap (driver: microbench/interference/driver.py, repurposed):
- Pin D streaming decode requests at constant max_tokens
- Inject one prefill-only request (max_tokens=1) of varying input length
- Bin decode-stream token timestamps into "during prefill" vs baseline
- Headline metric: effective per-stream TPOT during the prefill burst,
  = prefill_ttft / (num_tokens_during_prefill / D). This is the average
  rate at which each decode stream produces tokens during the burst.
  p50 of inter-token intervals is deceptive (chunked-prefill makes most
  intervals look normal); the burst-average gives the true cost.

Results (D=8 row, the most agentic-realistic case):
  P (tokens) | prefill_ttft | per-stream TPOT during | penalty
       2048  |    143 ms    |      32 ms             |    4×
       8192  |    583 ms    |     114 ms             |   15×
      32768  |  4520 ms     |     388 ms             |   52×
      65536  | 15615 ms     |     757 ms             |   99×
     131072  | 56991 ms     |    1419 ms             |  183×

Baseline TPOT at D=8: ~7.7 ms. So during a 131k-token prefill burst
each ongoing decode is running ~183× slower (i.e. essentially halted)
for ~57 seconds.

§3.2 implication: PD-disagg's promised phase-isolation benefit per
agentic request is bounded by the decode duration, which is 50–200 ms
for tool-call output. MB2 says the KV-transfer cost of PD-disagg
is 300 ms – 10 s for agentic-size requests. Cost > benefit for every
KV size above ~80 MiB (well below trace mean 192 MiB).

The new figs/pd_cost_vs_benefit.png overlays MB1 benefit ceiling
(50–200 ms band, capped by decode) onto MB2 transfer cost curve and
marks the agentic-distribution waypoints (trace mean, p90, p95, p99)
on the x-axis. Across the entire agentic distribution, the cost curve
sits above the benefit band.

Adds:
- microbench/fresh_setup/mb1_launch.sh: single-GPU vLLM launcher (no
  kv_connector, default chunked_prefill=on, max_num_batched_tokens=8192)
- microbench/fresh_setup/mb1_driver.py: copy of the existing
  microbench/interference/driver.py for cpfs deployment
- microbench/fresh_setup/analyze_mb1.py: aggregator emitting
  per-(D, P) effective-TPOT-during + max PD-disagg-benefit table
- microbench/fresh_setup/plot_mb1.py: mb1 standalone +
  pd_cost_vs_benefit headline figure
- analysis/mb1/summary.csv: 45 raw rows from the sweep
- analysis/mb1/breakdown.json: per-(D, P) aggregate
- analysis/mb1/README.md: persistent doc
- figs/mb1_interference.png: effective TPOT during prefill, one line per D
- figs/pd_cost_vs_benefit.png: §3.2 headline (cost > benefit everywhere)

Caveats noted in README:
- chunk_tokens=8192 only; Sarathi-Serve's smaller chunks would
  interleave decode more aggressively. Chunk-size sensitivity is
  flagged as next run.
- D ≤ 8; higher D may saturate or shrink the penalty further.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:25:09 +08:00

6.9 KiB

1chunk_sizedecode_batch_sizenew_prefill_tokensrepetitiontpot_baseline_p50_mstpot_baseline_p90_mstpot_during_prefill_p50_mstpot_during_prefill_p90_mstpot_after_prefill_p50_msprefill_ttft_msnum_tokens_during_prefilltpot_penalty_p50_mstpot_penalty_ratio
28192113107204.7775650164112454.9002348328940574.7013010480441154.9483973649330440.056719.251179951247-0.076263968367129560.9840370632099913
38192113107214.7794650308787824.8834056011401124.7074810136109594.854717000853270.056696.0898470133545-0.071984017267823220.9849388965495606
48192113107224.7909530112519864.8805442056618634.7283719759434464.9078318057581780.056880.190391966615-0.062581035308539870.9869376645603573
581921204804.778856993652884.89487639861181441.43457047757692688.973317306954410.0183.2046649651602436.6557134839240468.670393471202205
681921204814.7881619539111854.94977402267977641.6821355174761383.51438678801060.0175.55483896285295436.893973563564948.705247633369687
781921204824.78934298735111954.87420058343559523.18698249291628667.252027813810860.0131.23180496040732418.3976395055651664.841370215946989
8819213276804.7897740150801844.8708333983086054.7384860226884484.8866269993595780.04500.8393210009675-0.0512879923917353150.9892921895207875
9819213276814.7768349759280684.8916598199866714.7299530124291784.92455117637291550.04496.0733780171725-0.04688196349889040.9901855593221991
10819213276824.7844310174696154.8660325934179134.7828949755057694.89776642061770.04549.0139319445015-0.00153604196384549140.9996789499193871
11819216553604.7788549563847484.92554440861567854.6334050130099064.8955795820802450.015530.374245019635-0.14544994337484240.9695638506080803
12819216553614.7842830535955734.88084041280671954.7549059963785114.9857957987114790.015584.8876310046765-0.029377057217061520.99385967408534
13819216553624.7879939666017894.90047362400218844.68367501161992555.02712049637921150.015587.3900750302716-0.10431895498186350.9782123879625725
1481921819204.7850289847701794.8786188010126357.490115996915847324.065696797333660.0573.279502976220152.70508701214566831.565323014919123
1581921819214.7785919741727414.8995433724485345.9131429879926145336.80990760913120.0606.682382000144651.13455101381987331.237423705550061
1681921819224.788268008269374.901883611455566.276679981965572324.83709939988330.0571.749985974747751.4884119736962021.310845585736994
178192413107206.1138109886087486.3092053867876530.00.00.056702.7022890397350-6.1138109886087480.0
188192413107216.6308079694863417.0864594832528386.28204597160220154400.5008714098930.056807.70832300186150-0.34876199788413940.9474027902045915
198192413107226.0738194733858116.3445160281844446.3261250033974654409.8565561929780.056580.7848389958961490.25230553001165391.0415398467335428
2081924204805.4021605174057195.5438164854422216.21072450303472684.622088691685356.125201500253752140.3041940066032180.80856398562900721.1496741873966574
2181924204816.0671080136671666.3814150053076450.00.00.0140.061770973261450-6.0671080136671660.0
2281924204825.4003365221433345.53634701645933138.1568680168129585.070510988589385.25214200024493134.675529028754681332.7565314946696167.065646346363043
23819243276806.1155615258030596.3696040015202027.2166344907600431314.69787128153265.176242470042784522.433568025008501.1010729649569841.1800444587649532
24819243276816.0700959875248376.36123103322461250.00.00.04508.0740640405570-6.0700959875248370.0
25819243276826.07348000630736356.31266640266403612.4428110430017111315.04113279515884.7547140275128194556.892123946454456.3693310366943482.0487119460473635
26819246553605.4062929993961015.5409054912161080.00.00.015581.5906639909370-5.4062929993961010.0
27819246553616.0769100091420116.3151146285235880.00.00.015574.1960940067660-6.0769100091420110.0
28819246553626.0603790334425876.3840420334599916.4116700086742642077.47007039142774.802273004315793515603.720718005206790.35129097523167731.0579651822589267
2981924819206.1105750210117556.4160709735006098.451583969872445515.38556162267925.358011490898207574.6672929963097182.341008948860691.3831077993169092
3081924819216.0514290235005326.3981226063333450.00.00.0573.60817497828980-6.0514290235005320.0
3181924819226.0647299978882076.3664490007795390.00.00.0574.17078199796380-6.0647299978882070.0
328192813107207.7376169792842127.9983920149970810.7403760193847124742.4381357734097.79244198929518557010.667311958973353.00275904010051.388072845701685
338192813107217.7448955271393068.0136385222431278.6470684909727425123.2280839991297.67223697039298756970.409476023633100.90217296383343641.116486137310966
348192813107227.7401805028785028.01624098676256815.1400319882668554820.1365892076827.6894630328752156993.023935996463197.39985148538835351.9560308680962177
3581928204807.7412854880094538.0225595156662178.103576023131609124.870942678535367.6825070136692375141.97922096354887300.362290535122156141.046799789993963
3681928204817.7283100108616058.0210699816234418.1706795026548284.829067770624537.745136506855488144.1582590341568380.44236949179321531.0572401328584768
3781928204827.6622110209427778.0344249727204448.8788309949450287.235406995750967.592331967316568143.27958395006135391.2166199740022421.1587818412566437
38819283276807.2953334893099967.42281999555416411.4294000086374581315.432147582767.80349603155627854523.641717038117944.1340665193274621.5666727265292526
39819283276817.2781270428095017.49078151420690112.6404030306730421315.4914124868817.8216764959506694519.993302994408905.3622759878635411.736765922925357
40819283276827.6840490219183278.04771219845861210.7526854844763881315.51667052553977.804025022778664517.200137954205963.0686364625580611.3993514947399404
41819286553607.7081740018911668.01716899150051226.6626719967462122496.84276990010187.76856951415538815603.60116895753916018.9544979948550463.459012729889679
42819286553617.5948420271743097.987432304071262513.0549634923227132459.16901818127377.5469934963621215620.4749299795371745.4601214651484041.7189249553331216
43819286553627.6937179837841547.93305571423843517.55793800111862458.1768950447447.80870849848724915622.324909956661619.8642200173344462.2821135422594123
4481928819207.6365735149011027.90473760571330810.151655005756766514.81880577048297.7977380133233964575.7745200535282372.5150814908556641.3293468577167538
4581928819217.6877115061506637.9653934983070949.002390026580542524.07932362984877.753994490485638592.1044679707848451.31467852042987941.1710103870804793
4681928819227.7562204678542918.0354269884992398.864110975991935518.97269103210427.770269992761314581.98908099439411.10789050813764331.1428389655411813