agentic-kvc/patches/README.md

# vLLM Patches

Patches against vLLM v0.18.1. Apply to either the source tree (`third_party/vllm/`) or the installed package.

## Applying

```bash
# To source tree (for rebuilding)
cd third_party/vllm && git apply ../../patches/*.patch

# To installed package (quick, no rebuild)
SITE=$(python -c "import vllm; print(vllm.__path__[0])")
for p in patches/*.patch; do
    patch -p1 -d "$(dirname $SITE)" < "$p"
done
```

## Patches

### 0001-fix-kv-transfer-abort-race.patch

**File**: `vllm/v1/core/sched/scheduler.py`

**Problem**: When a client disconnects (timeout/abort) during PD-disaggregated serving, the Mooncake KV transfer callback arrives after the request has been removed from the scheduler. The `assert req_id in self.requests` kills the engine process.

**Fix**: Replace fatal assert with graceful skip + warning log.

**Impact**: Without this patch, decode instances crash after ~200 requests under sustained load with concurrent KV transfers.

**Upstream**: Not yet submitted. Could be upstreamed to vllm-project/vllm.