e0d3b5150a1580a9bb9d07af2e33877205b22496
Two bugs caught by 8C smoke:
mb5_launch.sh
${env_bp_arg} expanded as a literal command line prefix doesn't work
when env_bp_arg is itself a variable — bash only treats VAR=val as
an env assignment if it sees the literal in the parsed command, not
after expansion. Fix: always export VLLM_MOONCAKE_BOOTSTRAP_PORT as
a literal, defaulting to 9999 when caller passed no port (consumer
mode ignores the var so the placeholder is harmless).
mb5_run.sh
replayer's actual CLI flags are --trace / --output / --endpoint /
--model, not the --*-path / --*-name variants I had. Plus dash1
has no `bc`; compute wall_clock_s via python instead.
Both fixed; 8C smoke (CONFIG=8C REPS=1 REQUEST_LIMIT=20) now runs
end-to-end in ~30 s:
- 8 vLLM kv_both instances on GPU 0-7 come up
- replayer round-robins 20 reqs across them
- MB5 instrumentation captures 8 snapshot files (one per EngineCore
PID), ranging 7-139 snapshots each = ~10 Hz throttle works
- plot_kv_pool_timeline.py renders the stacked-area + queue-depth
chart cleanly (figs/mb5_smoke/*.png)
Pipeline validated. Ready for the real PD-ratio sweep.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%