Files
agentic-kvc/figs/mb2_transfer_bw_inter.png
Gahow Wang 90127c3389 MB2 inter-node: dash1↔dash2 transfer cost is identical to intra-node
Sweep on dash1 GPU 0 → dash2 GPU 0 over 200 Gbps RoCE.
remote_bootstrap_addr=http://172.27.123.142:8998. Same 9-size × 5-rep
config as the 2026-05-27 intra-node run.

Per-size pure_transfer (p50) lines up within 1–3% of the intra-node
numbers across all sizes:

  size      intra p50   inter p50
   512 tok    5.3 ms      5.2 ms
  2048 tok   20.6         20.0
  8192 tok   83.7         80.9
 32k  tok  320.9        309.6
 64k  tok 1895          1734       (bimodal in both)
128k  tok 2835          2818       (bimodal in both)

=> Mooncake's batch_transfer_sync_write **does not use NVLink** for
intra-node peers; both paths go through the 200 Gbps RDMA NIC, with
the 200 Gbps NIC (not the GPU interconnect) being the bottleneck. The
~9.7 GB/s steady-state ceiling and the 6+ GiB variance regime are
identical across topologies.

Operational implication for §3.2: PD-disaggregation does not get
cheaper by co-locating P and D on the same node — every routed request
pays the same ~10 GB/s ceiling for KV transfer, no matter where it
lands. Halving the transfer cost cannot be bought back by topology.

Caveat: B's receive_kv events did not log on dash2 — `MB2_LOG_DIR`
env var did not propagate through vLLM's EngineCore subprocess on
the consumer host (cat /proc/$ENGINE_PID/environ is empty on dash2
for that var, but the producer host on dash1 worked). For this run
pure_transfer numbers are from A's send_blocks alone; full rx_total
breakdown is not available, but pure_transfer is the dominant term.

Adds:
- analyze_mb2_send_only.py — analyzer that works from A's send_blocks
  alone when B's receive_kv events are absent
- plot_mb2_compare.py — overlay intra vs inter on the same axes
- plot_mb2.py — tolerate the `rows`-less send-only schema
- figs/mb2_transfer_{time,bw}_inter.png — inter-node single-curve
- figs/mb2_transfer_{time,bw}_compare.png — intra vs inter overlay
- analysis/mb2/A_inter_kvboth.jsonl, inter_kvboth_client.json,
  inter_kvboth_breakdown.json
- analysis/mb2/README.md — Summary block updated to reference both
  paths, dated 2026-05-27 run-log entry appended with the full table
  and the topology-independence framing

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:56:08 +08:00

71 KiB
1200x750px

/gahow/agentic-kvc/raw/commit/cf812b6264f7f64594e3b95c196d546d2742718b/figs/mb2_transfer_bw_inter.png