Critical:
- cache_aware_proxy: _handle_pd_sep leaked p_inst.num_requests (never
decremented) and never managed d_inst.num_requests; fix media_type
from application/json to text/event-stream for SSE stream
High:
- b3_sweep/b3_isolated_policy/b3_analyze: replace hardcoded
/home/admin/cpfs/wjh/ ROOT with script-relative $(dirname "$0")/..
- b3_analyze: replace hardcoded 8-port WORKER_MAP with dynamic
generation from BASE_PORT and N_INSTANCES
Medium:
- analyze_breakdown: warn on stderr when records are skipped (was silent)
- deploy_vllm_patches: fail-fast on SSH/SCP errors instead of
continuing with empty VENV_SITE
- pyproject.toml: declare fastapi and uvicorn as runtime dependencies
- launch_elastic_p2p: kill EngineCore and proxy in trap handler to
prevent GPU memory leaks on exit
Copies mooncake_connector.py, mooncake_utils.py, scheduler.py from
third_party/vllm to the pip-installed vllm's site-packages. C extensions
stay from the pip package; only Python files are overridden.
Usage: bash scripts/deploy_vllm_patches.sh [HOST]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>