Files
agentic-kvc/patches
Gahow Wang b6591950bc Add vLLM patches directory for version-controlled patch management
patches/0001-fix-kv-transfer-abort-race.patch:
  Fix scheduler assert crash when KV transfer callback arrives
  after request abort in PD-disaggregated serving.

patches/README.md:
  How to apply patches to source tree or installed package.
  Per-patch description with problem/fix/impact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 00:26:14 +08:00
..

vLLM Patches

Patches against vLLM v0.18.1. Apply to either the source tree (third_party/vllm/) or the installed package.

Applying

# To source tree (for rebuilding)
cd third_party/vllm && git apply ../../patches/*.patch

# To installed package (quick, no rebuild)
SITE=$(python -c "import vllm; print(vllm.__path__[0])")
for p in patches/*.patch; do
    patch -p1 -d "$(dirname $SITE)" < "$p"
done

Patches

0001-fix-kv-transfer-abort-race.patch

File: vllm/v1/core/sched/scheduler.py

Problem: When a client disconnects (timeout/abort) during PD-disaggregated serving, the Mooncake KV transfer callback arrives after the request has been removed from the scheduler. The assert req_id in self.requests kills the engine process.

Fix: Replace fatal assert with graceful skip + warning log.

Impact: Without this patch, decode instances crash after ~200 requests under sustained load with concurrent KV transfers.

Upstream: Not yet submitted. Could be upstreamed to vllm-project/vllm.