diff --git a/v2/README.md b/v2/README.md index 8062307..91aac4e 100644 --- a/v2/README.md +++ b/v2/README.md @@ -28,29 +28,38 @@ Run: `GPU=1 bash v2/exp_b_capacity_knee/run_sweep.sh` then ## Results (dash0, 2026-05-30) -### Exp (a) — GPU hit ≫ CPU hit ≫ miss (`figs/exp_a_tier_latency.png`) +### Exp (a) — GPU hit > CPU hit > remote-store(RDMA) hit ≫ miss (`figs/exp_a_tier_latency.png`) -TTFT (s, p50 over reps) to serve a reused prefix of length L. CPU-tier hits were -100% verified via `vllm:external_prefix_cache_hits`. +TTFT (s, p50 over reps) to serve a reused prefix of length L from each KV tier. +Local CPU-tier hits were 100% verified via `vllm:external_prefix_cache_hits`; +the **remote KV-store** tier is a real cross-instance Mooncake hit — instance B +serves the request by **pulling the cached prefix from instance A over RDMA** +(`do_remote_prefill`) instead of recomputing (the Mooncake-Store-blog mechanism), +measured with `microbench/fresh_setup/mb2_kv_transfer.py`. -| prefix L | miss (recompute) | CPU-tier hit | GPU-tier hit | miss/CPU | **CPU/GPU** | -|---:|---:|---:|---:|---:|---:| -| 1k | 0.078 | 0.057 | 0.042 | 1.4× | 1.4× | -| 4k | 0.261 | 0.064 | 0.046 | 4.1× | 1.4× | -| 8k | 0.588 | 0.076 | 0.053 | 7.7× | 1.4× | -| 16k | 1.547 | 0.105 | 0.063 | 14.8× | 1.7× | -| 32k | 4.604 | 0.158 | 0.080 | 29.2× | 2.0× | -| **64k** | **15.230** | **0.272** | **0.111** | **56.0×** | **2.4×** | +| prefix L | miss (recompute) | **remote RDMA store** | CPU-tier (local) | GPU-tier (HBM) | miss/RDMA | RDMA/CPU | CPU/GPU | +|---:|---:|---:|---:|---:|---:|---:|---:| +| 1k | 0.078 | 0.061 | 0.057 | 0.042 | 1.3× | 1.1× | 1.4× | +| 8k | 0.588 | 0.151 | 0.076 | 0.053 | 3.9× | 2.0× | 1.5× | +| 16k | 1.547 | 0.262 | 0.105 | 0.063 | 5.9× | 2.5× | 1.7× | +| 32k | 4.604 | 0.680 | 0.158 | 0.080 | 6.8× | 4.3× | 2.0× | +| **64k** | **15.230** | **0.966** | **0.272** | **0.111** | **15.8×** | **3.6×** | **2.4×** | - **GPU hit is ~flat** (42→111 ms over 1k→64k): a hit returns the whole prefix from HBM, only the last token is recomputed. - **miss grows superlinearly** (→15.2 s at 64k): a miss pays the full prefill. -- **CPU hit grows transfer-bound** (PCIe H2D measured **~54 GB/s**); CPU-hit TTFT ≈ - GPU-hit + KV/PCIe + ~0.15 s connector overhead (the dashed PCIe floor sits just - under the orange curve, confirming the decomposition). -- **Takeaway:** among hits, **GPU beats CPU by 1.4–2.5×** and the gap widens with - context. A CPU hit is a useful backstop (up to 56× better than recompute) but is - strictly worse than keeping the prefix resident in HBM. +- **local CPU hit grows transfer-bound** (PCIe H2D measured **~54 GB/s**); CPU-hit + TTFT ≈ GPU-hit + KV/PCIe + ~0.15 s overhead (dashed PCIe floor sits just under it). +- **remote RDMA-store hit** is the L3 tier the Mooncake-Store blog advocates: it is + a big win over recompute (**up to 16× lower TTFT**, consistent with the blog's + 46× at higher hit rates) — but it pays the **NIC tax** (~5–7 GB/s effective here, + cf. ~9.7 GB/s raw Mooncake RDMA in MB2; multi-NIC pooling would raise it). So it + is **3.6× slower than a local CPU hit and ~9× slower than a GPU hit** at 64k, and + the gap **grows with context length**. +- **Takeaway — the tier ordering is strict and widens with context:** + **GPU < CPU-local < remote-RDMA-store ≪ miss.** A global KV store helps (vs + recompute), which is why that approach exists; but every step *toward* the GPU is + another 1.4–4× of TTFT. The reuse that matters most is the GPU-resident kind. ### Exp (b) — APC and latency knee at small GPU capacity (`figs/exp_b_capacity_knee.png`) @@ -77,9 +86,13 @@ intra-session APC ceiling 71%), sweeping GPU KV capacity. ## Conclusion (for §2.2) -1. **Hits on GPU > hits on CPU** is now measured, not asserted: a GPU(HBM) hit is - 1.4–2.5× faster than a CPU(DRAM-offload) hit and 14–137× faster than recompute, - with the GPU advantage growing in context length (Exp a). +1. **The KV-tier hierarchy is now measured, not asserted** (Exp a): + `GPU(HBM) < CPU(local DRAM) < remote KV-store(RDMA) ≪ miss`. At 64k tokens a GPU + hit (0.11 s) is 2.4× faster than a local CPU hit, ~9× faster than a remote RDMA + store hit, and 137× faster than recompute; the gaps **grow with context length**. + A global RDMA store (Mooncake-Store blog) is a real win over recompute (up to 16× + here / 46× in the blog) — but it pays the NIC tax, so it sits a tier *below* local + CPU and two below GPU. Each step toward the GPU is another 1.4–4× of TTFT. 2. **You only need to hold the *active working set* on GPU.** Realized APC and latency saturate once HBM covers the concurrent sessions' working set (3.6 GB here); past that, extra capacity — and the entire CPU/storage tier built to chase @@ -94,6 +107,13 @@ intra-session APC ceiling 71%), sweeping GPU KV capacity. C1/f2c); it isolates the capacity→APC→latency mechanism. Knee *position* scales with concurrency × per-session working set. - Single H20; PCIe H2D ~54 GB/s is intra-node (cf. 9.7 GB/s Mooncake inter-node RDMA). +- Remote-RDMA tier is a single-node 2-instance Mooncake measurement (RDMA loopback + through the NIC; MB2 showed intra ≈ inter, NIC-bound). `t_transfer` includes the + request + 1-token decode + dst scheduling, so effective BW (~5–7 GB/s) is below the + raw ~9.7 GB/s; this is the realistic end-to-end remote-hit latency, not just the + wire transfer. The connector's retention-verify (`cached_followup`) is 0 because + kv_both `do_remote_prefill` does not reinsert the pulled prefix into dst's + persistent prefix cache — it does not affect the measured pull latency. - The 80.3% point at the knee slightly exceeds the 71% intra-session ceiling (transient full residency / generated-token reuse); steady state is 72.9%. diff --git a/v2/exp_a_tier_latency/plot.py b/v2/exp_a_tier_latency/plot.py index 8d889dc..7d2fae8 100644 --- a/v2/exp_a_tier_latency/plot.py +++ b/v2/exp_a_tier_latency/plot.py @@ -18,6 +18,7 @@ def load(name): miss, gpu, cpu, pcie = load("miss.json"), load("gpu.json"), load("cpu.json"), load("pcie.json") +rdma = load("rdma.json") def series(d): @@ -27,14 +28,35 @@ def series(d): return [a for a, _ in items], [b for _, b in items] +def rdma_series(): + """Remote KV-store hit over RDMA: p50 of t_transfer_s per prefix length + (dst pulls the cached prefix from the remote pool instead of recomputing).""" + if not rdma: + return [], {} + import statistics + from collections import defaultdict + by = defaultdict(list) + for r in rdma["raw"]: + by[r["input_tokens"]].append(r["t_transfer_s"]) + xs = sorted(by) + return xs, {L: statistics.median(by[L]) for L in xs} + + +rdma_x, rdma_p50 = rdma_series() + + fig, ax = plt.subplots(figsize=(7.2, 5.0)) for d, lab, mk, c in [(miss, "miss (recompute)", "o", "#d62728"), - (cpu, "CPU-tier hit (DRAM offload)", "s", "#ff7f0e"), + (cpu, "CPU-tier hit (local DRAM, PCIe)", "s", "#ff7f0e"), (gpu, "GPU-tier hit (HBM APC)", "^", "#2ca02c")]: xs, ys = series(d) if xs: ax.plot(xs, ys, marker=mk, label=lab, color=c, linewidth=2, markersize=7) +if rdma_x: + ax.plot(rdma_x, [rdma_p50[L] for L in rdma_x], marker="D", color="#9467bd", + linewidth=2, markersize=7, label="remote KV-store hit (Mooncake RDMA)") + if pcie: items = sorted(((int(k), v["transfer_s"]) for k, v in pcie["by_length"].items())) xs = [a for a, _ in items]; ys = [b for _, b in items] @@ -44,7 +66,8 @@ if pcie: ax.set_xscale("log", base=2); ax.set_yscale("log") ax.set_xlabel("Reused prefix length (tokens)") ax.set_ylabel("TTFT (s, log)") -ax.set_title("Cost of serving a reused prefix from each KV tier\nQwen3-Coder-30B-A3B, 1xH20") +ax.set_title("Cost of serving a reused prefix from each KV tier\n" + "Qwen3-Coder-30B-A3B, H20 (local tiers 1 GPU; RDMA pool 2 GPUs)") ax.grid(True, which="both", alpha=0.3) ax.legend() FIG.parent.mkdir(parents=True, exist_ok=True) @@ -52,16 +75,18 @@ fig.tight_layout(); fig.savefig(FIG, dpi=140) print("wrote", FIG) # Table -print(f"\n{'L':>7} {'miss(s)':>10} {'cpu(s)':>10} {'gpu(s)':>10} {'miss/cpu':>9} {'cpu/gpu':>9}") +print(f"\n{'L':>7} {'miss':>9} {'rdma':>9} {'cpu':>9} {'gpu':>9} " + f"{'miss/rdma':>9} {'rdma/cpu':>9} {'cpu/gpu':>9}") allL = sorted({int(k) for d in (miss, gpu, cpu) if d for k in d["by_length"]}) for L in allL: m = miss["by_length"].get(str(L), {}).get("ttft_p50") if miss else None c = cpu["by_length"].get(str(L), {}).get("ttft_p50") if cpu else None g = gpu["by_length"].get(str(L), {}).get("ttft_p50") if gpu else None + rd = rdma_p50.get(L) f = lambda x: f"{x:.4f}" if x is not None else " - " - r1 = f"{m/c:.1f}x" if (m and c) else " -" - r2 = f"{c/g:.1f}x" if (c and g) else " -" - print(f"{L:>7} {f(m):>10} {f(c):>10} {f(g):>10} {r1:>9} {r2:>9}") + rr = lambda a, b: f"{a/b:.1f}x" if (a and b) else " -" + print(f"{L:>7} {f(m):>9} {f(rd):>9} {f(c):>9} {f(g):>9} " + f"{rr(m,rd):>9} {rr(rd,c):>9} {rr(c,g):>9}") if cpu: vf = {k: v.get("verified_frac") for k, v in cpu["by_length"].items()} diff --git a/v2/exp_a_tier_latency/results/rdma.json b/v2/exp_a_tier_latency/results/rdma.json new file mode 100644 index 0000000..e8762f1 --- /dev/null +++ b/v2/exp_a_tier_latency/results/rdma.json @@ -0,0 +1,1013 @@ +{ + "model": "/home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-30B-A3B-Instruct", + "kv_bytes_per_token": 98304, + "src_host": "127.0.0.1", + "src_port": 8000, + "dst_host": "127.0.0.1", + "dst_port": 8001, + "config_label": "rdma-intra-node", + "raw": [ + { + "input_tokens": 1024, + "session": "42945df8aa2947dea7856aec043953c4", + "t_step1_client_unix": 1780113654.6796737, + "t_step2_client_unix": 1780113655.3103006, + "t_step2_end_unix": 1780113655.7742817, + "t_prefill_s": 0.630586825980572, + "t_transfer_s": 0.46393222000915557, + "t_followup_s": 0.09882230299990624, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "574133556117480ea4433033ee16ef7d", + "t_step1_client_unix": 1780113655.8735948, + "t_step2_client_unix": 1780113655.9512424, + "t_step2_end_unix": 1780113656.0120106, + "t_prefill_s": 0.07761351700173691, + "t_transfer_s": 0.06073612501495518, + "t_followup_s": 0.047241352003766224, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "74de5f623dda4e3ab9bef978237c058a", + "t_step1_client_unix": 1780113656.0597408, + "t_step2_client_unix": 1780113656.136515, + "t_step2_end_unix": 1780113656.1980197, + "t_prefill_s": 0.07674612698610872, + "t_transfer_s": 0.06146429298678413, + "t_followup_s": 0.04567234200658277, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "ae5339f421184215a06ee0406d70c6db", + "t_step1_client_unix": 1780113656.2442205, + "t_step2_client_unix": 1780113656.3202548, + "t_step2_end_unix": 1780113656.3829181, + "t_prefill_s": 0.07600510100019164, + "t_transfer_s": 0.06263405701611191, + "t_followup_s": 0.045199740008683875, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "108018399ef64e66a9780f032bd5f1d2", + "t_step1_client_unix": 1780113656.4285796, + "t_step2_client_unix": 1780113656.5039103, + "t_step2_end_unix": 1780113656.5650918, + "t_prefill_s": 0.07530323401442729, + "t_transfer_s": 0.06115360799594782, + "t_followup_s": 0.047417668014531955, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "4f147b3685f348599ee6d31f592d3d68", + "t_step1_client_unix": 1780113656.613002, + "t_step2_client_unix": 1780113656.688678, + "t_step2_end_unix": 1780113656.7499766, + "t_prefill_s": 0.07563983497675508, + "t_transfer_s": 0.061254432977875695, + "t_followup_s": 0.05035233299713582, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "7185314a355d4fb1a7be63667bb5d532", + "t_step1_client_unix": 1780113656.8008058, + "t_step2_client_unix": 1780113656.876437, + "t_step2_end_unix": 1780113656.9365327, + "t_prefill_s": 0.0756044389854651, + "t_transfer_s": 0.0600691509898752, + "t_followup_s": 0.04694663899135776, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "a75befce806b49ca8b4bc839d398a836", + "t_step1_client_unix": 1780113656.9840212, + "t_step2_client_unix": 1780113657.0596485, + "t_step2_end_unix": 1780113657.121014, + "t_prefill_s": 0.07559421702171676, + "t_transfer_s": 0.061336473998380825, + "t_followup_s": 0.04504776999237947, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "5eed050534974087b585636f8626d36b", + "t_step1_client_unix": 1780113657.1665282, + "t_step2_client_unix": 1780113657.2422314, + "t_step2_end_unix": 1780113657.3042758, + "t_prefill_s": 0.07567335499334149, + "t_transfer_s": 0.062015656993025914, + "t_followup_s": 0.044170998997287825, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "38ad990f52254bf39487a0ba49780d8b", + "t_step1_client_unix": 1780113657.3489134, + "t_step2_client_unix": 1780113657.425841, + "t_step2_end_unix": 1780113657.485667, + "t_prefill_s": 0.07690018997527659, + "t_transfer_s": 0.05979751900304109, + "t_followup_s": 0.047392740001669154, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 1024, + "session": "d01e4123b1b54d5e8578ec06bcb198f0", + "t_step1_client_unix": 1780113657.533507, + "t_step2_client_unix": 1780113657.6109633, + "t_step2_end_unix": 1780113657.673547, + "t_prefill_s": 0.07741417599027045, + "t_transfer_s": 0.0625474060070701, + "t_followup_s": 0.04817350400844589, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "62ff1956604d459e84052715e7ed8ba7", + "t_step1_client_unix": 1780113657.7226295, + "t_step2_client_unix": 1780113657.9077578, + "t_step2_end_unix": 1780113657.985153, + "t_prefill_s": 0.18509869498666376, + "t_transfer_s": 0.07735455100191757, + "t_followup_s": 0.046408719004830346, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "ec7b97db9dae4bc888c91b65c5efabba", + "t_step1_client_unix": 1780113658.0324116, + "t_step2_client_unix": 1780113658.1642551, + "t_step2_end_unix": 1780113658.2397475, + "t_prefill_s": 0.1318151500017848, + "t_transfer_s": 0.07546432898379862, + "t_followup_s": 0.04965048600570299, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "082ed4f2baec4107acf0dbebabbcf32f", + "t_step1_client_unix": 1780113658.2902431, + "t_step2_client_unix": 1780113658.4218829, + "t_step2_end_unix": 1780113658.494315, + "t_prefill_s": 0.1316128930193372, + "t_transfer_s": 0.07240396799170412, + "t_followup_s": 0.04556456100544892, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "890e243bc7f742c19a899cd19160e6ce", + "t_step1_client_unix": 1780113658.540683, + "t_step2_client_unix": 1780113658.6717823, + "t_step2_end_unix": 1780113658.7469842, + "t_prefill_s": 0.13105839598574676, + "t_transfer_s": 0.07517452799947932, + "t_followup_s": 0.04641962700407021, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "e4ca5aef90be4b45bc2222ad3c637581", + "t_step1_client_unix": 1780113658.7943006, + "t_step2_client_unix": 1780113658.9255776, + "t_step2_end_unix": 1780113659.0016084, + "t_prefill_s": 0.13125143098295666, + "t_transfer_s": 0.07600446898140945, + "t_followup_s": 0.04827121601556428, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "64e21909088043e98e7532361d747128", + "t_step1_client_unix": 1780113659.0507092, + "t_step2_client_unix": 1780113659.1820831, + "t_step2_end_unix": 1780113659.2554085, + "t_prefill_s": 0.13133879398810677, + "t_transfer_s": 0.07329847698565573, + "t_followup_s": 0.050039641006151214, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "8d3cb21a4f1a4daa94d31a48d969188a", + "t_step1_client_unix": 1780113659.3063629, + "t_step2_client_unix": 1780113659.437705, + "t_step2_end_unix": 1780113659.5105412, + "t_prefill_s": 0.13131436699768528, + "t_transfer_s": 0.07280980699579231, + "t_followup_s": 0.045990444981725886, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "a949f45bf00e4b6f9e282a48d2f54ae9", + "t_step1_client_unix": 1780113659.5574262, + "t_step2_client_unix": 1780113659.6894715, + "t_step2_end_unix": 1780113659.763246, + "t_prefill_s": 0.13201635199948214, + "t_transfer_s": 0.07374655400053598, + "t_followup_s": 0.04740602700621821, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "a1350bc4af1d4a95a8d7870aac6c30d4", + "t_step1_client_unix": 1780113659.8115327, + "t_step2_client_unix": 1780113659.94307, + "t_step2_end_unix": 1780113660.0165753, + "t_prefill_s": 0.13151001499500126, + "t_transfer_s": 0.07347742197453044, + "t_followup_s": 0.054559190990403295, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "0cbada3f1cf34ad98150c7cd05df9e81", + "t_step1_client_unix": 1780113660.0719872, + "t_step2_client_unix": 1780113660.2065787, + "t_step2_end_unix": 1780113660.2800386, + "t_prefill_s": 0.13456111898995005, + "t_transfer_s": 0.07343242000206374, + "t_followup_s": 0.04842582900892012, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 2048, + "session": "086166dcac9f4cf587152003a665dd6d", + "t_step1_client_unix": 1780113660.3293216, + "t_step2_client_unix": 1780113660.4612148, + "t_step2_end_unix": 1780113660.5348513, + "t_prefill_s": 0.13186514400877059, + "t_transfer_s": 0.07360914399032481, + "t_followup_s": 0.04667951900046319, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "e16840958b4e42378aa2f474224f7c47", + "t_step1_client_unix": 1780113660.5832062, + "t_step2_client_unix": 1780113660.8487082, + "t_step2_end_unix": 1780113660.9490895, + "t_prefill_s": 0.26547358598327264, + "t_transfer_s": 0.10033372900215909, + "t_followup_s": 0.04891211100039072, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "8119bfc1b947496e8434bc2dd780ddf2", + "t_step1_client_unix": 1780113660.9995859, + "t_step2_client_unix": 1780113661.2600472, + "t_step2_end_unix": 1780113661.3582017, + "t_prefill_s": 0.2604305710119661, + "t_transfer_s": 0.09812465400318615, + "t_followup_s": 0.0544344789814204, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "cb517bfea1ac46058ffbf918582c74d4", + "t_step1_client_unix": 1780113661.414242, + "t_step2_client_unix": 1780113661.674581, + "t_step2_end_unix": 1780113661.7718108, + "t_prefill_s": 0.2603055170038715, + "t_transfer_s": 0.09719818198936991, + "t_followup_s": 0.0503446809889283, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "89eabc4ad0534b4a934b1e852050c6f5", + "t_step1_client_unix": 1780113661.823718, + "t_step2_client_unix": 1780113662.0844855, + "t_step2_end_unix": 1780113662.1830976, + "t_prefill_s": 0.2607354299980216, + "t_transfer_s": 0.09858069600886665, + "t_followup_s": 0.04950785997789353, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "b644572bba324d829a249aaeeb22ee54", + "t_step1_client_unix": 1780113662.2342994, + "t_step2_client_unix": 1780113662.4946663, + "t_step2_end_unix": 1780113662.5922973, + "t_prefill_s": 0.26033472700510174, + "t_transfer_s": 0.09760292299324647, + "t_followup_s": 0.05018276898772456, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "01faaeab31bd4feb9b27bd03e5ee75b1", + "t_step1_client_unix": 1780113662.6441038, + "t_step2_client_unix": 1780113662.9044588, + "t_step2_end_unix": 1780113663.0019135, + "t_prefill_s": 0.26032789101009257, + "t_transfer_s": 0.09742468598415144, + "t_followup_s": 0.0510591670172289, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "87a72e6be33b46c29035ea566f1c79cc", + "t_step1_client_unix": 1780113663.0545735, + "t_step2_client_unix": 1780113663.315426, + "t_step2_end_unix": 1780113663.415479, + "t_prefill_s": 0.2608224749856163, + "t_transfer_s": 0.10002275300212204, + "t_followup_s": 0.05161238700384274, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "28b5629f8b04451d9404f5dd15b860a0", + "t_step1_client_unix": 1780113663.4687288, + "t_step2_client_unix": 1780113663.7299144, + "t_step2_end_unix": 1780113663.8311837, + "t_prefill_s": 0.26115050600492395, + "t_transfer_s": 0.10123400299926288, + "t_followup_s": 0.04951982302009128, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "ed022f1b155c4fe191032d8c917124c5", + "t_step1_client_unix": 1780113663.882292, + "t_step2_client_unix": 1780113664.1436367, + "t_step2_end_unix": 1780113664.2428293, + "t_prefill_s": 0.2613153549900744, + "t_transfer_s": 0.0991641900036484, + "t_followup_s": 0.05027229798724875, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "cb09310c2178452cb2ecda3f4c4f4642", + "t_step1_client_unix": 1780113664.2947013, + "t_step2_client_unix": 1780113664.5561187, + "t_step2_end_unix": 1780113664.6587803, + "t_prefill_s": 0.2613755869970191, + "t_transfer_s": 0.10260535500128753, + "t_followup_s": 0.054842573998030275, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 4096, + "session": "26a14c40b7424663ad5bf012e04b0d48", + "t_step1_client_unix": 1780113664.7152772, + "t_step2_client_unix": 1780113664.9784563, + "t_step2_end_unix": 1780113665.0804708, + "t_prefill_s": 0.2631347790011205, + "t_transfer_s": 0.10197760400478728, + "t_followup_s": 0.05454107699915767, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "28aa179a55b04b6e826df76997e49965", + "t_step1_client_unix": 1780113665.1381943, + "t_step2_client_unix": 1780113665.725791, + "t_step2_end_unix": 1780113665.8781552, + "t_prefill_s": 0.587549192016013, + "t_transfer_s": 0.15232679099426605, + "t_followup_s": 0.05964333200245164, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "c947ef89013d420faf4784bd0e0d5532", + "t_step1_client_unix": 1780113665.9409757, + "t_step2_client_unix": 1780113666.5288377, + "t_step2_end_unix": 1780113666.677537, + "t_prefill_s": 0.5878257739823312, + "t_transfer_s": 0.14866453199647367, + "t_followup_s": 0.05501791799906641, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "c71d8f7e44454f208391211b91409f34", + "t_step1_client_unix": 1780113666.7356484, + "t_step2_client_unix": 1780113667.3223348, + "t_step2_end_unix": 1780113667.475447, + "t_prefill_s": 0.5866534260276239, + "t_transfer_s": 0.15308237000135705, + "t_followup_s": 0.06039133798913099, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "d2f1c3b53ea8409ca49d5b6bcc7bea2a", + "t_step1_client_unix": 1780113667.5389588, + "t_step2_client_unix": 1780113668.1259756, + "t_step2_end_unix": 1780113668.2758486, + "t_prefill_s": 0.5869663549819961, + "t_transfer_s": 0.14983698999276385, + "t_followup_s": 0.05716336300247349, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "2b5edf5a21cc44f5929a722322d89d52", + "t_step1_client_unix": 1780113668.3363352, + "t_step2_client_unix": 1780113668.9260604, + "t_step2_end_unix": 1780113669.079605, + "t_prefill_s": 0.5896447559935041, + "t_transfer_s": 0.15350710399798118, + "t_followup_s": 0.05581469400203787, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "3d29eed43ca644ab998a89ae996521a0", + "t_step1_client_unix": 1780113669.1385791, + "t_step2_client_unix": 1780113669.7260985, + "t_step2_end_unix": 1780113669.8774042, + "t_prefill_s": 0.5874853960121982, + "t_transfer_s": 0.15127158901304938, + "t_followup_s": 0.057220480986870825, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "657303232eec4008abdc80cb69e27508", + "t_step1_client_unix": 1780113669.937695, + "t_step2_client_unix": 1780113670.5248249, + "t_step2_end_unix": 1780113670.6737514, + "t_prefill_s": 0.5870937670115381, + "t_transfer_s": 0.14888012100709602, + "t_followup_s": 0.059168930019950494, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "3560bebf9f7d4138b33cd30dd16aacfd", + "t_step1_client_unix": 1780113670.7366655, + "t_step2_client_unix": 1780113671.3249788, + "t_step2_end_unix": 1780113671.477699, + "t_prefill_s": 0.5882545940112323, + "t_transfer_s": 0.1526875200215727, + "t_followup_s": 0.06097331500495784, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "a8bb4afeb02b4ff99eac92226f0d5f58", + "t_step1_client_unix": 1780113671.5417728, + "t_step2_client_unix": 1780113672.1291869, + "t_step2_end_unix": 1780113672.280071, + "t_prefill_s": 0.5873805590090342, + "t_transfer_s": 0.15085255599115044, + "t_followup_s": 0.057901772001059726, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "68c32fca0260409c8a632ff617513587", + "t_step1_client_unix": 1780113672.3411584, + "t_step2_client_unix": 1780113672.9282277, + "t_step2_end_unix": 1780113673.0856972, + "t_prefill_s": 0.5870382410066668, + "t_transfer_s": 0.15743795299204066, + "t_followup_s": 0.05747991299722344, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 8192, + "session": "b95e321ef53e4383b302c2db7ebcca73", + "t_step1_client_unix": 1780113673.1462142, + "t_step2_client_unix": 1780113673.7336242, + "t_step2_end_unix": 1780113673.8830152, + "t_prefill_s": 0.5873774820065591, + "t_transfer_s": 0.149357913993299, + "t_followup_s": 0.05543989400030114, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "0979f595e2b44f3285b4242fda1ee3df", + "t_step1_client_unix": 1780113673.9444556, + "t_step2_client_unix": 1780113675.5027504, + "t_step2_end_unix": 1780113675.7646046, + "t_prefill_s": 1.558238979021553, + "t_transfer_s": 0.2618165699823294, + "t_followup_s": 0.07395038701361045, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "14c8c786b4a846149945ea141f705863", + "t_step1_client_unix": 1780113675.8444831, + "t_step2_client_unix": 1780113677.3960428, + "t_step2_end_unix": 1780113677.6459343, + "t_prefill_s": 1.5515160829818342, + "t_transfer_s": 0.249856763985008, + "t_followup_s": 0.06939491498633288, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "31d3cb21ac8d4ced89ab33ce4fb7589b", + "t_step1_client_unix": 1780113677.721303, + "t_step2_client_unix": 1780113679.2696404, + "t_step2_end_unix": 1780113679.5167441, + "t_prefill_s": 1.548295495013008, + "t_transfer_s": 0.24705595700652339, + "t_followup_s": 0.06773578398860991, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "a9d5cbce333f4c21afaa8bdbeab31bf3", + "t_step1_client_unix": 1780113679.5905588, + "t_step2_client_unix": 1780113681.1353347, + "t_step2_end_unix": 1780113681.37527, + "t_prefill_s": 1.5447358399978839, + "t_transfer_s": 0.23989686000277288, + "t_followup_s": 0.06704856699798256, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "e58110ad40cc46e0b15b222ebf0cbe86", + "t_step1_client_unix": 1780113681.4499586, + "t_step2_client_unix": 1780113682.9963198, + "t_step2_end_unix": 1780113683.2421525, + "t_prefill_s": 1.5463133199955337, + "t_transfer_s": 0.2457908419892192, + "t_followup_s": 0.06450486500398256, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "fbec8cdaac0c4ef8a028c68e49311b9a", + "t_step1_client_unix": 1780113683.3127434, + "t_step2_client_unix": 1780113684.8600428, + "t_step2_end_unix": 1780113685.106106, + "t_prefill_s": 1.5472517309826799, + "t_transfer_s": 0.2460192689904943, + "t_followup_s": 0.06439224901259877, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "7598055ceec1423d9b1d88c6200a1c21", + "t_step1_client_unix": 1780113685.1769145, + "t_step2_client_unix": 1780113686.732929, + "t_step2_end_unix": 1780113687.0007024, + "t_prefill_s": 1.5559708710061386, + "t_transfer_s": 0.2677336060150992, + "t_followup_s": 0.06560050300322473, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "f0512699fd10499ebb2421087d5138cc", + "t_step1_client_unix": 1780113687.0725183, + "t_step2_client_unix": 1780113688.6251225, + "t_step2_end_unix": 1780113688.9931352, + "t_prefill_s": 1.5525663039879873, + "t_transfer_s": 0.3679773240000941, + "t_followup_s": 0.07065863799653016, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "818876ee380f4a8f9258ddff7554e718", + "t_step1_client_unix": 1780113689.0700693, + "t_step2_client_unix": 1780113690.6202304, + "t_step2_end_unix": 1780113690.976204, + "t_prefill_s": 1.5501246719795745, + "t_transfer_s": 0.3559349720017053, + "t_followup_s": 0.06182111200178042, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "440b924a8dc44a57a60a8d8790b83425", + "t_step1_client_unix": 1780113691.0440626, + "t_step2_client_unix": 1780113692.600122, + "t_step2_end_unix": 1780113692.9298801, + "t_prefill_s": 1.5560206689988263, + "t_transfer_s": 0.3297227440052666, + "t_followup_s": 0.06168726898613386, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 16384, + "session": "bf7c1545c5a048b497c4934e849d9b77", + "t_step1_client_unix": 1780113692.9983332, + "t_step2_client_unix": 1780113694.549985, + "t_step2_end_unix": 1780113694.902612, + "t_prefill_s": 1.5515996039903257, + "t_transfer_s": 0.3525931639887858, + "t_followup_s": 0.06388000200968236, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "3ea99ff99c034fadaae11e0a18cbcd12", + "t_step1_client_unix": 1780113694.978447, + "t_step2_client_unix": 1780113699.5872533, + "t_step2_end_unix": 1780113700.3202353, + "t_prefill_s": 4.608766862016637, + "t_transfer_s": 0.7329461979970802, + "t_followup_s": 0.08203631298965774, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "160f663843cc4da4a4c1968620e74bbe", + "t_step1_client_unix": 1780113700.4144092, + "t_step2_client_unix": 1780113705.0188375, + "t_step2_end_unix": 1780113705.7512634, + "t_prefill_s": 4.604390962980688, + "t_transfer_s": 0.7323910310224164, + "t_followup_s": 0.08162634400650859, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "44be5da3d4d9421ea682b02af879f038", + "t_step1_client_unix": 1780113705.8446133, + "t_step2_client_unix": 1780113710.446565, + "t_step2_end_unix": 1780113711.1756835, + "t_prefill_s": 4.60191403501085, + "t_transfer_s": 0.7290820410125889, + "t_followup_s": 0.08314321600482799, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "76fb5e99bda74949ab48c2a806796e20", + "t_step1_client_unix": 1780113711.2712142, + "t_step2_client_unix": 1780113715.8831358, + "t_step2_end_unix": 1780113716.605932, + "t_prefill_s": 4.611884240992367, + "t_transfer_s": 0.7227587410015985, + "t_followup_s": 0.08411424100631848, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "2698392640df43d98c5b88ec14603ab4", + "t_step1_client_unix": 1780113716.7025611, + "t_step2_client_unix": 1780113721.3066654, + "t_step2_end_unix": 1780113721.9864836, + "t_prefill_s": 4.60406794998562, + "t_transfer_s": 0.6797809940180741, + "t_followup_s": 0.0797950770065654, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "3a317d68c23d48a6b464cbb7471f1b15", + "t_step1_client_unix": 1780113722.078643, + "t_step2_client_unix": 1780113726.683183, + "t_step2_end_unix": 1780113727.3790748, + "t_prefill_s": 4.604500039014965, + "t_transfer_s": 0.6958539690240286, + "t_followup_s": 0.0791638570080977, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "ef55cea31d1d4c9faa3095c6572b3158", + "t_step1_client_unix": 1780113727.4699852, + "t_step2_client_unix": 1780113732.0734339, + "t_step2_end_unix": 1780113732.6349242, + "t_prefill_s": 4.603411051008152, + "t_transfer_s": 0.5614562759874389, + "t_followup_s": 0.07942970001022331, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "51111cf4420040aabc709ec95b69299c", + "t_step1_client_unix": 1780113732.726002, + "t_step2_client_unix": 1780113737.3279588, + "t_step2_end_unix": 1780113737.768525, + "t_prefill_s": 4.6019097800017335, + "t_transfer_s": 0.4405296350014396, + "t_followup_s": 0.07934816100168973, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "a4218f26139940b7a63c574c58c0cc78", + "t_step1_client_unix": 1780113737.8595614, + "t_step2_client_unix": 1780113742.4654746, + "t_step2_end_unix": 1780113742.9027445, + "t_prefill_s": 4.605869954015361, + "t_transfer_s": 0.43722251401050016, + "t_followup_s": 0.08265988298808224, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "ef166eea7c0241a0b6683c9b9be98e26", + "t_step1_client_unix": 1780113742.9973059, + "t_step2_client_unix": 1780113747.60252, + "t_step2_end_unix": 1780113748.0344896, + "t_prefill_s": 4.605175297008827, + "t_transfer_s": 0.43193520401837304, + "t_followup_s": 0.08029142298619263, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 32768, + "session": "510b2beab0934d4191ce4c4b4fb8bb20", + "t_step1_client_unix": 1780113748.1287684, + "t_step2_client_unix": 1780113752.728051, + "t_step2_end_unix": 1780113753.1700265, + "t_prefill_s": 4.599246629979461, + "t_transfer_s": 0.4419387440138962, + "t_followup_s": 0.08067455000127666, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "d51ed2927db344808d34fac0c09d5d29", + "t_step1_client_unix": 1780113753.274224, + "t_step2_client_unix": 1780113768.560607, + "t_step2_end_unix": 1780113769.367652, + "t_prefill_s": 15.286344999010907, + "t_transfer_s": 0.8070085129875224, + "t_followup_s": 0.11767313402378932, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "a7af94a6de9547b4b943c986b7e6f001", + "t_step1_client_unix": 1780113769.5093763, + "t_step2_client_unix": 1780113784.8304524, + "t_step2_end_unix": 1780113785.7471335, + "t_prefill_s": 15.31890859498526, + "t_transfer_s": 0.9138955899979919, + "t_followup_s": 0.1327741070126649, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "48300f75fbb347e6859a6a0585dcc04e", + "t_step1_client_unix": 1780113785.9058163, + "t_step2_client_unix": 1780113801.193769, + "t_step2_end_unix": 1780113802.842876, + "t_prefill_s": 15.287883285986027, + "t_transfer_s": 1.6490699610149022, + "t_followup_s": 0.10919595797895454, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "c61d6f9f29024284b284f54d907d5fc9", + "t_step1_client_unix": 1780113802.975333, + "t_step2_client_unix": 1780113818.271725, + "t_step2_end_unix": 1780113820.8995724, + "t_prefill_s": 15.296337511012098, + "t_transfer_s": 2.6278096020105295, + "t_followup_s": 0.12072346499189734, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "52ef9ee529ef44d7b6140535833c05a3", + "t_step1_client_unix": 1780113821.0437183, + "t_step2_client_unix": 1780113836.3338494, + "t_step2_end_unix": 1780113838.7353442, + "t_prefill_s": 15.290092708019074, + "t_transfer_s": 2.4014571599836927, + "t_followup_s": 0.11354603400104679, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "99730c3c4684499286c1831c027e3b39", + "t_step1_client_unix": 1780113838.8726406, + "t_step2_client_unix": 1780113854.1566162, + "t_step2_end_unix": 1780113855.7166855, + "t_prefill_s": 15.283936669002287, + "t_transfer_s": 1.560034744994482, + "t_followup_s": 0.11488680701586418, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "6f072e8c09734c2cb3ff771c538b80c5", + "t_step1_client_unix": 1780113855.8550732, + "t_step2_client_unix": 1780113871.1431854, + "t_step2_end_unix": 1780113872.1094413, + "t_prefill_s": 15.288075597025454, + "t_transfer_s": 0.9662203160114586, + "t_followup_s": 0.11588297100388445, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "6bd7c6a2a8614f0a8ab32ad7b6fdc6da", + "t_step1_client_unix": 1780113872.2491899, + "t_step2_client_unix": 1780113887.532155, + "t_step2_end_unix": 1780113888.451564, + "t_prefill_s": 15.282926287996816, + "t_transfer_s": 0.9193728909885976, + "t_followup_s": 0.11174739900161512, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "03936c26f94947539251b7a8fe429eb5", + "t_step1_client_unix": 1780113888.5874689, + "t_step2_client_unix": 1780113903.8650937, + "t_step2_end_unix": 1780113904.6822457, + "t_prefill_s": 15.277585066010943, + "t_transfer_s": 0.8171155720192473, + "t_followup_s": 0.11233539402019233, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "fbef8e1b6fd94f28b8e0edbc86135cf5", + "t_step1_client_unix": 1780113904.8176143, + "t_step2_client_unix": 1780113920.1035478, + "t_step2_end_unix": 1780113921.0166585, + "t_prefill_s": 15.285897619993193, + "t_transfer_s": 0.9130747889867052, + "t_followup_s": 0.13272951598628424, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + }, + { + "input_tokens": 65536, + "session": "526e3b3a3df64dea9e513e7bb9c3da53", + "t_step1_client_unix": 1780113921.172305, + "t_step2_client_unix": 1780113936.4544415, + "t_step2_end_unix": 1780113937.868287, + "t_prefill_s": 15.282098516006954, + "t_transfer_s": 1.4137536820198875, + "t_followup_s": 0.11649087502155453, + "cached_followup": 0, + "pull_completion_tokens": 1, + "ok": false + } + ], + "summary": [] +} \ No newline at end of file diff --git a/v2/exp_a_tier_latency/run_rdma.sh b/v2/exp_a_tier_latency/run_rdma.sh new file mode 100644 index 0000000..4f9ca0a --- /dev/null +++ b/v2/exp_a_tier_latency/run_rdma.sh @@ -0,0 +1,53 @@ +#!/bin/bash +# Exp (a) 4th tier: remote global-KV-store hit over RDMA (Mooncake). +# Two kv_both MooncakeConnector instances (GPU0=src, GPU1=dst). For each prefix +# length: src prefills+caches the KV, dst serves the request by PULLING that KV +# over RDMA (do_remote_prefill) instead of recomputing -> that pull time is the +# remote-store hit latency. Mirrors the Mooncake-Store blog mechanism. +set -uo pipefail +cd /home/admin/cpfs/wjh/agentic-kv +PY=.venv/bin/python +MODEL=/home/admin/cpfs/wjh/models/Qwen/Qwen3-Coder-30B-A3B-Instruct +OUT=v2/exp_a_tier_latency/results +mkdir -p "$OUT" +PIDS=() + +launch() { # $1 gpu, $2 http port, $3 bootstrap port, $4 master port + VLLM_MOONCAKE_BOOTSTRAP_PORT=$3 MASTER_PORT=$4 CUDA_VISIBLE_DEVICES=$1 VLLM_LOGGING_LEVEL=WARNING \ + $PY -m vllm.entrypoints.openai.api_server --model "$MODEL" \ + --host 0.0.0.0 --port $2 --tensor-parallel-size 1 --trust-remote-code \ + --enable-prefix-caching --enforce-eager --dtype auto --max-model-len 70000 \ + --gpu-memory-utilization 0.9 \ + --kv-transfer-config '{"kv_connector":"MooncakeConnector","kv_role":"kv_both"}' \ + > "$OUT/vllm_rdma_$2.log" 2>&1 & + PIDS+=($!) +} +teardown() { + for p in "${PIDS[@]:-}"; do kill -TERM "$p" 2>/dev/null; done + sleep 6 + for p in $(pgrep -f "VLLM::EngineCore"); do kill -9 "$p" 2>/dev/null; done + sleep 3 +} +trap teardown EXIT + +echo ">>> launch 2 kv_both instances (GPU0:8000/bp8998, GPU1:8001/bp8999)" +launch 0 8000 8998 29550 +launch 1 8001 8999 29551 +for port in 8000 8001; do + echo -n " wait health $port..." + timeout 900 bash -c "until curl -sf http://127.0.0.1:$port/health >/dev/null 2>&1; do sleep 5; done" \ + && echo " ok" || { echo " FAIL"; tail -25 "$OUT/vllm_rdma_$port.log"; exit 1; } +done +for bp in 8998 8999; do + timeout 180 bash -c "until curl -s http://127.0.0.1:$bp/query >/dev/null 2>&1; do sleep 2; done" +done +echo " bootstrap ports ready." +sleep 3 + +$PY microbench/fresh_setup/mb2_kv_transfer.py \ + --src-host 127.0.0.1 --dst-host 127.0.0.1 \ + --src-port 8000 --dst-port 8001 --src-bp 8998 --dst-bp 8999 \ + --sizes 1024,2048,4096,8192,16384,32768,65536 --repeats 11 \ + --label rdma-intra-node --out "$OUT/rdma.json" + +echo "=== exp (a) RDMA tier DONE ===" diff --git a/v2/figs/exp_a_tier_latency.png b/v2/figs/exp_a_tier_latency.png index cbeae7d..a79ed73 100644 Binary files a/v2/figs/exp_a_tier_latency.png and b/v2/figs/exp_a_tier_latency.png differ