Files
agentic-kvc/microbench/connector_tax/launch/launch_mooncake_both.sh
Gahow Wang 297fed6e73 Microbench 3 (connector_tax): infrastructure for KV connector substrate tax
Validates the elastic_migration_v2 finding that kv_role=kv_both adds
TTFT p90 +45% even when PD-sep never fires. Replicates under
single-instance, synthetic, open-loop workload to disambiguate
mechanism cost from 8-instance feedback amplification.

Configurations (8):
  plain, noop_connector, mooncake_{producer,consumer,both},
  nixl_both, lmcache_only, multi_mooncake_lmcache.

Pre-flight verification gates risky configs (kv_consumer needs dummy
bootstrap, multi-connector composition, NoOp custom class loading).

Workload: two-phase sweep
  Phase A: rate {0.5..32} req/s × shape (4096, 256), saturation criteria
  Phase B: ref_safe rate × cartesian (input ∈ {512,4k,32k}, output ∈ {64,256,1024})

Step-timing patch enriches vLLM's existing AGENTIC_STEP_LOG_PATH emit
with step_duration_us and build_meta_us — directly measures per-step
substrate cost, not just user-visible TTFT/TPOT.

run_all.sh runs as 5-stage barrier:
  0 pre-flight + apply patch
  1 Phase A all configs
  2 pick ref_safe / ref_load
  3 Phase B all configs
  4 revert patch + analyze + plot

Outputs aggregate.{json,csv}, MANIFEST.tsv, and 5 figures.
Estimated runtime: 4-5.5 hours on idle dash0 H20.
2026-05-26 17:27:41 +08:00

19 lines
631 B
Bash
Executable File

#!/bin/bash
# vLLM + Mooncake kv_both (alone, never transfers — README claim under test).
set -euo pipefail
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
BOOTSTRAP_PORT="${BOOTSTRAP_PORT:-8998}"
KV_CFG='{"kv_connector":"MooncakeConnector","kv_role":"kv_both"}'
VLLM_MOONCAKE_BOOTSTRAP_PORT="$BOOTSTRAP_PORT" \
CUDA_VISIBLE_DEVICES="$GPU_ID" \
"$PYTHON" -m vllm.entrypoints.openai.api_server \
"${COMMON_VLLM_ARGS[@]}" \
--kv-transfer-config "$KV_CFG" \
> "$LOG_DIR/vllm_stdout.log" 2> "$LOG_DIR/vllm_stderr.log" &
VLLM_PID=$!
echo "$VLLM_PID" > "$LOG_DIR/.vllm.pid"
echo "vLLM PID=$VLLM_PID"
wait_for_ready 240