Instrumentation patches (microbench/patches/):
- pd_profile.py: shared event emitter (VLLM_PD_PROFILE_LOG env var)
- apply_patches.py: idempotent patch installer for mooncake_connector.py
and scheduler.py, marks insertions with # PD_PROFILE_PATCH
- analyze_events.py: joins per-process JSONL event logs by transfer_id
into per-request phase durations
Seven events captured per request:
D_get_num_matched → P_zmq_received → P_prefill_done →
P_rdma_start → P_rdma_end → D_recv_complete → D_request_promoted
Driver fix (microbench/lifecycle/driver.py):
seed_prefix_cache now sends via the proxy URL so P and D both cache
the seeded prefix with matching block hashes. Previously seeding D
directly produced different block hashes than the proxy-routed
measurement requests, making incremental transfer impossible.
Real breakdown (fig_breakdown_real.png, server_breakdown.csv, n=93):
prefill_compute 620 ms median (95% of overhead)
rdma_transfer 42 ms median (~71 Gbps effective)
other overhead 10 ms median (dispatch + params + signal + promote)
Mooncake transfer is NOT the bottleneck. Even with bulk RDMA the
transfer cost is <10% of prefill cost for Qwen3-30B-A3B on H20.
2 lines
14 KiB
JSON
2 lines
14 KiB
JSON
[{"C": 0, "N": 2048, "O": 128, "rep": 0, "ttft_ms": 2622.1298610325903, "decode_ms": 622.6007409859449, "e2e_ms": 3244.730602018535}, {"C": 0, "N": 2048, "O": 128, "rep": 1, "ttft_ms": 5035.966206050944, "decode_ms": 614.4816449959762, "e2e_ms": 5650.44785104692}, {"C": 0, "N": 2048, "O": 128, "rep": 2, "ttft_ms": 5154.69168999698, "decode_ms": 614.7105349809863, "e2e_ms": 5769.402224977966}, {"C": 0, "N": 2048, "O": 1, "rep": 0, "ttft_ms": 714.142931974493, "decode_ms": 0.0, "e2e_ms": 714.142931974493}, {"C": 0, "N": 2048, "O": 1, "rep": 1, "ttft_ms": 174.80234400136396, "decode_ms": 0.0, "e2e_ms": 174.80234400136396}, {"C": 0, "N": 2048, "O": 1, "rep": 2, "ttft_ms": 613.1551950238645, "decode_ms": 0.0, "e2e_ms": 613.1551950238645}, {"C": 0, "N": 2048, "O": 32, "rep": 0, "ttft_ms": 181.63492099847645, "decode_ms": 151.64763299981132, "e2e_ms": 333.28255399828777}, {"C": 0, "N": 2048, "O": 32, "rep": 1, "ttft_ms": 169.92324101738632, "decode_ms": 146.16969600319862, "e2e_ms": 316.09293702058494}, {"C": 0, "N": 2048, "O": 32, "rep": 2, "ttft_ms": 172.31316398829222, "decode_ms": 144.68226401368156, "e2e_ms": 316.9954280019738}, {"C": 0, "N": 32768, "O": 128, "rep": 0, "ttft_ms": 5622.758224024437, "decode_ms": 810.583260958083, "e2e_ms": 6433.34148498252}, {"C": 0, "N": 32768, "O": 128, "rep": 1, "ttft_ms": 7788.5780279757455, "decode_ms": 805.351905990392, "e2e_ms": 8593.929933966137}, {"C": 0, "N": 32768, "O": 128, "rep": 2, "ttft_ms": 7058.200741012115, "decode_ms": 809.2719879932702, "e2e_ms": 7867.472729005385}, {"C": 0, "N": 32768, "O": 1, "rep": 0, "ttft_ms": 4707.1176070021465, "decode_ms": 0.0, "e2e_ms": 4707.1176070021465}, {"C": 0, "N": 32768, "O": 1, "rep": 1, "ttft_ms": 9008.615329978056, "decode_ms": 0.0, "e2e_ms": 9008.615329978056}, {"C": 0, "N": 32768, "O": 1, "rep": 2, "ttft_ms": 8051.877580001019, "decode_ms": 0.0, "e2e_ms": 8051.877580001019}, {"C": 0, "N": 32768, "O": 32, "rep": 0, "ttft_ms": 5841.207611025311, "decode_ms": 188.54903295869008, "e2e_ms": 6029.756643984001}, {"C": 0, "N": 32768, "O": 32, "rep": 1, "ttft_ms": 8518.694756028708, "decode_ms": 188.50646598730236, "e2e_ms": 8707.20122201601}, {"C": 0, "N": 32768, "O": 32, "rep": 2, "ttft_ms": 7864.250976999756, "decode_ms": 193.97482799831778, "e2e_ms": 8058.225804998074}, {"C": 0, "N": 512, "O": 128, "rep": 0, "ttft_ms": 728.4340729820542, "decode_ms": 634.8835580283776, "e2e_ms": 1363.3176310104318}, {"C": 0, "N": 512, "O": 128, "rep": 1, "ttft_ms": 664.6796799614094, "decode_ms": 632.417738030199, "e2e_ms": 1297.0974179916084}, {"C": 0, "N": 512, "O": 128, "rep": 2, "ttft_ms": 410.0534439785406, "decode_ms": 631.1866460018791, "e2e_ms": 1041.2400899804197}, {"C": 0, "N": 512, "O": 1, "rep": 0, "ttft_ms": 95.32903297804296, "decode_ms": 0.0, "e2e_ms": 95.32903297804296}, {"C": 0, "N": 512, "O": 1, "rep": 1, "ttft_ms": 86.69768198160455, "decode_ms": 0.0, "e2e_ms": 86.69768198160455}, {"C": 0, "N": 512, "O": 1, "rep": 2, "ttft_ms": 86.6819639923051, "decode_ms": 0.0, "e2e_ms": 86.6819639923051}, {"C": 0, "N": 512, "O": 32, "rep": 0, "ttft_ms": 327.12496898602694, "decode_ms": 143.43970402842388, "e2e_ms": 470.5646730144508}, {"C": 0, "N": 512, "O": 32, "rep": 1, "ttft_ms": 706.4849309972487, "decode_ms": 143.21339299203828, "e2e_ms": 849.698323989287}, {"C": 0, "N": 512, "O": 32, "rep": 2, "ttft_ms": 970.4811199917458, "decode_ms": 138.52277700789273, "e2e_ms": 1109.0038969996385}, {"C": 0, "N": 8192, "O": 128, "rep": 0, "ttft_ms": 4131.300662003923, "decode_ms": 938.5443479986861, "e2e_ms": 5069.845010002609}, {"C": 0, "N": 8192, "O": 128, "rep": 1, "ttft_ms": 5790.499842027202, "decode_ms": 1001.3815849670209, "e2e_ms": 6791.881426994223}, {"C": 0, "N": 8192, "O": 128, "rep": 2, "ttft_ms": 667.2124259639531, "decode_ms": 645.1671160175465, "e2e_ms": 1312.3795419814996}, {"C": 0, "N": 8192, "O": 1, "rep": 0, "ttft_ms": 5141.556997958105, "decode_ms": 0.0, "e2e_ms": 5141.556997958105}, {"C": 0, "N": 8192, "O": 1, "rep": 1, "ttft_ms": 5749.518854019698, "decode_ms": 0.0, "e2e_ms": 5749.518854019698}, {"C": 0, "N": 8192, "O": 1, "rep": 2, "ttft_ms": 622.4338539759628, "decode_ms": 0.0, "e2e_ms": 622.4338539759628}, {"C": 0, "N": 8192, "O": 32, "rep": 0, "ttft_ms": 2566.6930489824153, "decode_ms": 155.39966901997104, "e2e_ms": 2722.0927180023864}, {"C": 0, "N": 8192, "O": 32, "rep": 1, "ttft_ms": 610.6696259812452, "decode_ms": 155.02595499856398, "e2e_ms": 765.6955809798092}, {"C": 0, "N": 8192, "O": 32, "rep": 2, "ttft_ms": 5785.52105196286, "decode_ms": 245.27249700622633, "e2e_ms": 6030.793548969086}, {"C": 32768, "N": 2048, "O": 128, "rep": 0, "ttft_ms": 672.5708569865674, "decode_ms": 853.154796990566, "e2e_ms": 1525.7256539771333}, {"C": 32768, "N": 2048, "O": 128, "rep": 1, "ttft_ms": 5643.10712198494, "decode_ms": 852.2008170257322, "e2e_ms": 6495.307939010672}, {"C": 32768, "N": 2048, "O": 128, "rep": 2, "ttft_ms": 673.4716359642334, "decode_ms": 853.2642320496961, "e2e_ms": 1526.7358680139296}, {"C": 32768, "N": 2048, "O": 1, "rep": 0, "ttft_ms": 672.9947460116819, "decode_ms": 0.0, "e2e_ms": 672.9947460116819}, {"C": 32768, "N": 2048, "O": 1, "rep": 1, "ttft_ms": 5632.6510579674505, "decode_ms": 0.0, "e2e_ms": 5632.6510579674505}, {"C": 32768, "N": 2048, "O": 1, "rep": 2, "ttft_ms": 685.9644679934718, "decode_ms": 0.0, "e2e_ms": 685.9644679934718}, {"C": 32768, "N": 2048, "O": 32, "rep": 0, "ttft_ms": 5624.487289984245, "decode_ms": 199.2717279936187, "e2e_ms": 5823.759017977864}, {"C": 32768, "N": 2048, "O": 32, "rep": 1, "ttft_ms": 674.4227990275249, "decode_ms": 199.34023398673162, "e2e_ms": 873.7630330142565}, {"C": 32768, "N": 2048, "O": 32, "rep": 2, "ttft_ms": 5588.859603041783, "decode_ms": 198.71050998335704, "e2e_ms": 5787.57011302514}, {"C": 32768, "N": 32768, "O": 128, "rep": 0, "ttft_ms": 10629.962367995176, "decode_ms": 1042.7065159892663, "e2e_ms": 11672.668883984443}, {"C": 32768, "N": 32768, "O": 128, "rep": 1, "ttft_ms": 15692.335498984903, "decode_ms": 1043.1572350207716, "e2e_ms": 16735.492734005675}, {"C": 32768, "N": 32768, "O": 128, "rep": 2, "ttft_ms": 10634.278179029934, "decode_ms": 1043.5610840213485, "e2e_ms": 11677.839263051283}, {"C": 32768, "N": 32768, "O": 1, "rep": 0, "ttft_ms": 10855.033353029285, "decode_ms": 0.0, "e2e_ms": 10855.033353029285}, {"C": 32768, "N": 32768, "O": 1, "rep": 1, "ttft_ms": 15949.58597101504, "decode_ms": 0.0, "e2e_ms": 15949.58597101504}, {"C": 32768, "N": 32768, "O": 1, "rep": 2, "ttft_ms": 10859.418225008994, "decode_ms": 0.0, "e2e_ms": 10859.418225008994}, {"C": 32768, "N": 32768, "O": 32, "rep": 0, "ttft_ms": 15834.861388022546, "decode_ms": 238.35640895413235, "e2e_ms": 16073.217796976678}, {"C": 32768, "N": 32768, "O": 32, "rep": 1, "ttft_ms": 10674.632200971246, "decode_ms": 238.6556770070456, "e2e_ms": 10913.287877978291}, {"C": 32768, "N": 32768, "O": 32, "rep": 2, "ttft_ms": 15735.035521967802, "decode_ms": 238.99846902349964, "e2e_ms": 15974.033990991302}, {"C": 32768, "N": 512, "O": 128, "rep": 0, "ttft_ms": 5252.030164992902, "decode_ms": 838.8174789724872, "e2e_ms": 6090.84764396539}, {"C": 32768, "N": 512, "O": 128, "rep": 1, "ttft_ms": 285.0615810020827, "decode_ms": 838.4164330200292, "e2e_ms": 1123.478014022112}, {"C": 32768, "N": 512, "O": 128, "rep": 2, "ttft_ms": 5247.092439036351, "decode_ms": 839.5037169684656, "e2e_ms": 6086.596156004816}, {"C": 32768, "N": 512, "O": 1, "rep": 0, "ttft_ms": 5251.212673960254, "decode_ms": 0.0, "e2e_ms": 5251.212673960254}, {"C": 32768, "N": 512, "O": 1, "rep": 1, "ttft_ms": 285.5950220255181, "decode_ms": 0.0, "e2e_ms": 285.5950220255181}, {"C": 32768, "N": 512, "O": 1, "rep": 2, "ttft_ms": 5249.9970899662, "decode_ms": 0.0, "e2e_ms": 5249.9970899662}, {"C": 32768, "N": 512, "O": 32, "rep": 0, "ttft_ms": 288.1336290156469, "decode_ms": 194.8392489575781, "e2e_ms": 482.97287797322497}, {"C": 32768, "N": 512, "O": 32, "rep": 1, "ttft_ms": 5258.001476002391, "decode_ms": 188.89945500995964, "e2e_ms": 5446.900931012351}, {"C": 32768, "N": 512, "O": 32, "rep": 2, "ttft_ms": 285.77856201445684, "decode_ms": 195.73098694672808, "e2e_ms": 481.5095489611849}, {"C": 32768, "N": 8192, "O": 128, "rep": 0, "ttft_ms": 7284.454459033441, "decode_ms": 886.0723180114292, "e2e_ms": 8170.52677704487}, {"C": 32768, "N": 8192, "O": 128, "rep": 1, "ttft_ms": 2252.070823975373, "decode_ms": 885.6825350085273, "e2e_ms": 3137.7533589839004}, {"C": 32768, "N": 8192, "O": 128, "rep": 2, "ttft_ms": 7287.232631002553, "decode_ms": 885.5364889604971, "e2e_ms": 8172.76911996305}, {"C": 32768, "N": 8192, "O": 1, "rep": 0, "ttft_ms": 7340.854128007777, "decode_ms": 0.0, "e2e_ms": 7340.854128007777}, {"C": 32768, "N": 8192, "O": 1, "rep": 1, "ttft_ms": 2244.4935280364007, "decode_ms": 0.0, "e2e_ms": 2244.4935280364007}, {"C": 32768, "N": 8192, "O": 1, "rep": 2, "ttft_ms": 7313.673258002382, "decode_ms": 0.0, "e2e_ms": 7313.673258002382}, {"C": 32768, "N": 8192, "O": 32, "rep": 0, "ttft_ms": 2244.318378972821, "decode_ms": 205.39685402764007, "e2e_ms": 2449.715233000461}, {"C": 32768, "N": 8192, "O": 32, "rep": 1, "ttft_ms": 7289.380786009133, "decode_ms": 205.53624304011464, "e2e_ms": 7494.9170290492475}, {"C": 32768, "N": 8192, "O": 32, "rep": 2, "ttft_ms": 2244.3198550026864, "decode_ms": 205.5884290020913, "e2e_ms": 2449.9082840047777}, {"C": 4096, "N": 2048, "O": 128, "rep": 0, "ttft_ms": 3991.63863400463, "decode_ms": 633.001334965229, "e2e_ms": 4624.639968969859}, {"C": 4096, "N": 2048, "O": 128, "rep": 1, "ttft_ms": 6423.19184797816, "decode_ms": 635.6502959970385, "e2e_ms": 7058.842143975198}, {"C": 4096, "N": 2048, "O": 128, "rep": 2, "ttft_ms": 6404.474611976184, "decode_ms": 633.7045449763536, "e2e_ms": 7038.179156952538}, {"C": 4096, "N": 2048, "O": 1, "rep": 0, "ttft_ms": 5024.260417965706, "decode_ms": 0.0, "e2e_ms": 5024.260417965706}, {"C": 4096, "N": 2048, "O": 1, "rep": 1, "ttft_ms": 218.7655169982463, "decode_ms": 0.0, "e2e_ms": 218.7655169982463}, {"C": 4096, "N": 2048, "O": 1, "rep": 2, "ttft_ms": 4964.461475028656, "decode_ms": 0.0, "e2e_ms": 4964.461475028656}, {"C": 4096, "N": 2048, "O": 32, "rep": 0, "ttft_ms": 450.62972395680845, "decode_ms": 152.63368899468333, "e2e_ms": 603.2634129514918}, {"C": 4096, "N": 2048, "O": 32, "rep": 1, "ttft_ms": 220.793966029305, "decode_ms": 152.18362800078467, "e2e_ms": 372.9775940300897}, {"C": 4096, "N": 2048, "O": 32, "rep": 2, "ttft_ms": 457.47345697600394, "decode_ms": 148.38339103152975, "e2e_ms": 605.8568480075337}, {"C": 4096, "N": 32768, "O": 128, "rep": 0, "ttft_ms": 5227.104032994248, "decode_ms": 834.0455970028415, "e2e_ms": 6061.1496299970895}, {"C": 4096, "N": 32768, "O": 128, "rep": 1, "ttft_ms": 5735.778502014, "decode_ms": 839.3324179924093, "e2e_ms": 6575.110920006409}, {"C": 4096, "N": 32768, "O": 128, "rep": 2, "ttft_ms": 5430.421528988518, "decode_ms": 839.2051489790902, "e2e_ms": 6269.626677967608}, {"C": 4096, "N": 32768, "O": 1, "rep": 0, "ttft_ms": 19767.440686002374, "decode_ms": 0.0, "e2e_ms": 19767.440686002374}, {"C": 4096, "N": 32768, "O": 1, "rep": 1, "ttft_ms": 19000.393452995922, "decode_ms": 0.0, "e2e_ms": 19000.393452995922}, {"C": 4096, "N": 32768, "O": 1, "rep": 2, "ttft_ms": 18874.21447498491, "decode_ms": 0.0, "e2e_ms": 18874.21447498491}, {"C": 4096, "N": 32768, "O": 32, "rep": 0, "ttft_ms": 16880.265624029562, "decode_ms": 194.37245797598734, "e2e_ms": 17074.63808200555}, {"C": 4096, "N": 32768, "O": 32, "rep": 1, "ttft_ms": 18671.995759010315, "decode_ms": 187.84850998781621, "e2e_ms": 18859.84426899813}, {"C": 4096, "N": 32768, "O": 32, "rep": 2, "ttft_ms": 18827.935320034157, "decode_ms": 192.57517997175455, "e2e_ms": 19020.510500005912}, {"C": 4096, "N": 512, "O": 128, "rep": 0, "ttft_ms": 5161.038128018845, "decode_ms": 629.8650890239514, "e2e_ms": 5790.903217042796}, {"C": 4096, "N": 512, "O": 128, "rep": 1, "ttft_ms": 4914.044548990205, "decode_ms": 628.3650430268608, "e2e_ms": 5542.409592017066}, {"C": 4096, "N": 512, "O": 128, "rep": 2, "ttft_ms": 4910.116966988426, "decode_ms": 629.2002300033346, "e2e_ms": 5539.31719699176}, {"C": 4096, "N": 512, "O": 1, "rep": 0, "ttft_ms": 112.64649598160759, "decode_ms": 0.0, "e2e_ms": 112.64649598160759}, {"C": 4096, "N": 512, "O": 1, "rep": 1, "ttft_ms": 334.4383900403045, "decode_ms": 0.0, "e2e_ms": 334.4383900403045}, {"C": 4096, "N": 512, "O": 1, "rep": 2, "ttft_ms": 102.44947602041066, "decode_ms": 0.0, "e2e_ms": 102.44947602041066}, {"C": 4096, "N": 512, "O": 32, "rep": 0, "ttft_ms": 2651.748332020361, "decode_ms": 151.33654902456328, "e2e_ms": 2803.0848810449243}, {"C": 4096, "N": 512, "O": 32, "rep": 1, "ttft_ms": 5405.860565020703, "decode_ms": 154.49790301499888, "e2e_ms": 5560.358468035702}, {"C": 4096, "N": 512, "O": 32, "rep": 2, "ttft_ms": 808.8750389870256, "decode_ms": 151.434927014634, "e2e_ms": 960.3099660016596}, {"C": 4096, "N": 8192, "O": 128, "rep": 0, "ttft_ms": 15969.269191962667, "decode_ms": 678.461742005311, "e2e_ms": 16647.730933967978}, {"C": 4096, "N": 8192, "O": 128, "rep": 1, "ttft_ms": 15008.873330953065, "decode_ms": 695.1191920088604, "e2e_ms": 15703.992522961926}, {"C": 4096, "N": 8192, "O": 128, "rep": 2, "ttft_ms": 15012.583939009346, "decode_ms": 678.5469150054269, "e2e_ms": 15691.130854014773}, {"C": 4096, "N": 8192, "O": 1, "rep": 0, "ttft_ms": 7057.054870994762, "decode_ms": 0.0, "e2e_ms": 7057.054870994762}, {"C": 4096, "N": 8192, "O": 1, "rep": 1, "ttft_ms": 7253.246881009545, "decode_ms": 0.0, "e2e_ms": 7253.246881009545}, {"C": 4096, "N": 8192, "O": 1, "rep": 2, "ttft_ms": 7289.37086701626, "decode_ms": 0.0, "e2e_ms": 7289.37086701626}, {"C": 4096, "N": 8192, "O": 32, "rep": 0, "ttft_ms": 7315.68243104266, "decode_ms": 193.3563009952195, "e2e_ms": 7509.0387320378795}, {"C": 4096, "N": 8192, "O": 32, "rep": 1, "ttft_ms": 7780.287707981188, "decode_ms": 200.99342102184892, "e2e_ms": 7981.281129003037}, {"C": 4096, "N": 8192, "O": 32, "rep": 2, "ttft_ms": 7812.988903024234, "decode_ms": 200.84698597202078, "e2e_ms": 8013.835888996255}]
|