Validates the elastic_migration_v2 finding that kv_role=kv_both adds
TTFT p90 +45% even when PD-sep never fires. Replicates under
single-instance, synthetic, open-loop workload to disambiguate
mechanism cost from 8-instance feedback amplification.
Configurations (8):
plain, noop_connector, mooncake_{producer,consumer,both},
nixl_both, lmcache_only, multi_mooncake_lmcache.
Pre-flight verification gates risky configs (kv_consumer needs dummy
bootstrap, multi-connector composition, NoOp custom class loading).
Workload: two-phase sweep
Phase A: rate {0.5..32} req/s × shape (4096, 256), saturation criteria
Phase B: ref_safe rate × cartesian (input ∈ {512,4k,32k}, output ∈ {64,256,1024})
Step-timing patch enriches vLLM's existing AGENTIC_STEP_LOG_PATH emit
with step_duration_us and build_meta_us — directly measures per-step
substrate cost, not just user-visible TTFT/TPOT.
run_all.sh runs as 5-stage barrier:
0 pre-flight + apply patch
1 Phase A all configs
2 pick ref_safe / ref_load
3 Phase B all configs
4 revert patch + analyze + plot
Outputs aggregate.{json,csv}, MANIFEST.tsv, and 5 figures.
Estimated runtime: 4-5.5 hours on idle dash0 H20.
27 lines
981 B
Bash
Executable File
27 lines
981 B
Bash
Executable File
#!/bin/bash
|
|
# Pre-flight: verify NoOpConnector loads and serves requests.
|
|
# Exits 0 = OK, 42 = SKIP (dependency missing), nonzero = fail.
|
|
|
|
set -euo pipefail
|
|
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
RUN_DIR="${RUN_DIR:-$HERE/../results/preflight/noop}"
|
|
mkdir -p "$RUN_DIR"
|
|
|
|
bash "$HERE/../launch/launch_noop_connector.sh"
|
|
PORT="${PORT:-8000}"
|
|
|
|
# Issue 5 short requests
|
|
for i in 1 2 3 4 5; do
|
|
code=$(curl -s -o "$RUN_DIR/req_$i.json" -w "%{http_code}" \
|
|
-X POST "http://127.0.0.1:$PORT/v1/chat/completions" \
|
|
-H 'Content-Type: application/json' \
|
|
-d "{\"model\":\"$(ls $HOME/models/Qwen | grep Qwen3-Coder-30B | head -1 | xargs -I{} echo $HOME/models/Qwen/{})\",\"messages\":[{\"role\":\"user\",\"content\":\"hello $i\"}],\"max_tokens\":4,\"temperature\":0}" \
|
|
--max-time 60 || echo "000")
|
|
if [[ "$code" != "200" ]]; then
|
|
echo "FAIL: req $i status=$code" >&2
|
|
exit 1
|
|
fi
|
|
done
|
|
|
|
echo "OK: all 5 requests succeeded"
|