Files
Gahow Wang e9abd70c8d MB5 driver: launcher, orchestrator, KV-pool timeline plotter
Three new files to drive the PD ratio sweep + per-request KV occupancy
capture, plus a deploy.sh update so the patched replayer rides along
to the fresh-venv host.

mb5_launch.sh
  One script handles all four configs we plan to sweep:
    CONFIG=8C / 6P+2D / 4P+4D / 2P+6D
  - For 8C: 8 vLLM instances with kv_role=kv_both on GPU 0-7. Replayer
    talks to them via the existing comma-separated round-robin in
    replayer/replay.py — no proxy.
  - For PD configs: kv_role=kv_producer for the P pool (with
    VLLM_MOONCAKE_BOOTSTRAP_PORT) + kv_role=kv_consumer for the D pool,
    routed by the official vLLM example
    third_party/vllm/examples/online_serving/disaggregated_serving/
    mooncake_connector/mooncake_connector_proxy.py — no policy choice
    made by us, per user instruction to use the standard recipe.
  - Applies instrument_kv_snapshot.py before launching so every
    EngineCore writes its per-step KV snapshot to
    $RUN_ROOT/kv_snapshots/mb5_kv_snapshot_pid<pid>.jsonl
  - Reverts the patch on stop.
  - Emits ENDPOINTS= line on stdout for the orchestrator to read.

mb5_run.sh
  For each CONFIG × rep: launch, replay w600 trace via the existing
  replayer, capture wall-clock, tear down, cool down 10 s. Defaults:
    CONFIGS="8C 6P+2D 4P+4D 2P+6D"
    REPS=3
    TRACE=traces/w600_r0.0015_st30.jsonl
  All artefacts go under $FRESH_ROOT/mb5_runs/$RUN_TAG_${config}_rep${rep}/
  (vllm_logs/, kv_snapshots/, replay_metrics.jsonl, wall_clock_s.txt).

plot_kv_pool_timeline.py
  Reads one or more mb5_kv_snapshot_pid*.jsonl files and renders a
  stacked-area chart per file:
    x = wall-clock since first snapshot
    y = KV block count, stacked by per-request contribution
    overlay: pool-total ceiling, 90% line, waiting-queue depth subplot
  Bands are colored by a deterministic hash of request_id so individual
  requests are visually tractable across the run.
  This is the figure the user asked for — turns headline "PD-disagg is
  10× worse" into a system-level picture of *where* the KV pool is
  blocked, when, and by which requests.

deploy.sh
  Also tar-syncs the local replayer/ dir to
  /home/admin/cpfs/wjh/agentic-kv-fresh/replayer/ so mb5_run.sh can
  `python -m replayer` against the patched (trace_span_s/amplification)
  version, not the older copy under /home/admin/cpfs/wjh/agentic-kv/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 23:02:57 +08:00

34 lines
1.3 KiB
Bash
Executable File

#!/usr/bin/env bash
# Sync microbench/fresh_setup/ to /home/admin/cpfs/wjh/agentic-kv-fresh/scripts/
# so dash1 / dash2 see the same scripts. cpfs is mounted at the same path on
# both, so one rsync from any host with cpfs access is enough.
#
# Run from the agentic-kv repo root (this directory contains microbench/).
set -eo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SRC="${SCRIPT_DIR}/"
DEST_HOST="${1:-dash1}"
DEST="/home/admin/cpfs/wjh/agentic-kv-fresh/scripts/"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
REPLAYER_SRC="${REPO_ROOT}/replayer"
REPLAYER_DEST="/home/admin/cpfs/wjh/agentic-kv-fresh/replayer/"
echo "[deploy] syncing ${SRC} -> ${DEST_HOST}:${DEST}"
ssh "${DEST_HOST}" "mkdir -p ${DEST} ${REPLAYER_DEST}"
# dash1/2 don't have rsync; use tar over ssh.
tar -C "${SCRIPT_DIR}" --exclude='__pycache__' --exclude='*.pyc' \
--exclude='deploy.sh' -czf - . \
| ssh "${DEST_HOST}" "cd ${DEST} && tar -xzf -"
if [ -d "${REPLAYER_SRC}" ]; then
echo "[deploy] syncing ${REPLAYER_SRC}/ -> ${DEST_HOST}:${REPLAYER_DEST}"
tar -C "${REPLAYER_SRC}" --exclude='__pycache__' --exclude='*.pyc' -czf - . \
| ssh "${DEST_HOST}" "cd ${REPLAYER_DEST} && tar -xzf -"
fi
echo "[deploy] done"
ssh "${DEST_HOST}" "ls -la ${DEST}; echo '---replayer---'; ls -la ${REPLAYER_DEST}"