agentic-kvc

Files

Gahow Wang efdcf3c555 MB2: per-stage instrumentation patch + launcher integration

Per-stage breakdown of "step 2" (the B-side do_remote_prefill) requires
vLLM/mooncake-internal timing — we cannot infer it from black-box HTTP
E2E. This commit adds the four pieces to do that breakdown:

instrument_mooncake.py
  apply / revert / check patches on mooncake_connector_v1.py to emit
  structured JSONL transfer events at two key sites:

    send_blocks (P-side, on batch_transfer_sync_write):
      {event, remote_session, total_bytes, duration_s, t_start_unix,
       ret, tp_rank, t_log_unix}
    receive_kv (D-side, on the ZMQ-driven pull request):
      {event, path, local_req_ids, remote_req_ids, duration_s,
       t_start_unix, tp_rank, t_log_unix}

  All injected code is bracketed by `# MB2_INSTRUMENT_START/END` so the
  --revert pass is a single regex scan. Apply-revert round-trip
  validated on dash1 (PATCHED → py_compile ok → revert → CLEAN → ok).

start_vllm_pair.sh (updated)
  - Picks up instrument_mooncake.py via SCRIPT_DIR.
  - On `start`: applies patch before launching the two vLLM instances.
  - On `stop` (or trap exit): reverts patch.
  - Sets per-instance MB2_LOG_DIR = $FRESH_ROOT/mb2_transfer_logs/{A,B}/
    so send-side and receive-side events land in cleanly separated dirs.

deploy.sh
  tar-over-ssh sync of microbench/fresh_setup/ → cpfs
  /home/admin/cpfs/wjh/agentic-kv-fresh/scripts/ so dash1 / dash2 see
  the same scripts (dash{1,2} don't have rsync; tar pipe works).

The mb2_kv_transfer.py client still uses black-box E2E timing — the
next commit will teach it to ingest the per-instance JSONL logs to
produce the 4-way breakdown (queueing / setup / transfer / decode).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-27 18:12:44 +08:00

connector_tax

Connector tax: trace-replay confirms +45% kv_both penalty is gone; DR-fix adds 22% more

2026-05-27 09:13:50 +08:00

fresh_setup

MB2: per-stage instrumentation patch + launcher integration

2026-05-27 18:12:44 +08:00

interference

Microbench 1 plots: prefill-decode interference heatmap + lines

2026-05-26 14:21:30 +08:00

lifecycle

PD-sep server-side profiling: vLLM patches + per-request breakdown