# vLLM Patches Patches against vLLM v0.18.1. Apply to either the source tree (`third_party/vllm/`) or the installed package. ## Applying ```bash # To source tree (for rebuilding) cd third_party/vllm && git apply ../../patches/*.patch # To installed package (quick, no rebuild) SITE=$(python -c "import vllm; print(vllm.__path__[0])") for p in patches/*.patch; do patch -p1 -d "$(dirname $SITE)" < "$p" done ``` ## Patches ### 0001-fix-kv-transfer-abort-race.patch **File**: `vllm/v1/core/sched/scheduler.py` **Problem**: When a client disconnects (timeout/abort) during PD-disaggregated serving, the Mooncake KV transfer callback arrives after the request has been removed from the scheduler. The `assert req_id in self.requests` kills the engine process. **Fix**: Replace fatal assert with graceful skip + warning log. **Impact**: Without this patch, decode instances crash after ~200 requests under sustained load with concurrent KV transfers. **Upstream**: Not yet submitted. Could be upstreamed to vllm-project/vllm.