Fix scheduler assertion crash on partial remote prefill finished_recving

The assertion `assert RequestStatus.is_finished(req.status)` at scheduler.py:2109 fires when a partial-remote-prefill request receives `finished_recving` while in RUNNING state (local prefill already started before RDMA read completed). This was the root cause of 67% error rate: EngineCore crashed with "fatal error" assertion, killing the vLLM instance. Fix: Replace assertion with debug log for non-WAITING, non-finished requests. kv_both no-offload baseline confirmed 0 errors, proving the crash was from our scheduler patch, not kv_both instability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 23:33:26 +08:00
parent 4f93bb5b8a
commit 29b901b145
1 changed files with 5 additions and 2 deletions
--- a/third_party/vllm/vllm/v1/core/sched/scheduler.py
+++ b/third_party/vllm/vllm/v1/core/sched/scheduler.py
@@ -2105,9 +2105,12 @@ class Scheduler(SchedulerInterface):
            req = self.requests[req_id]
            if req.status == RequestStatus.WAITING_FOR_REMOTE_KVS:
                self.finished_recving_kv_req_ids.add(req_id)
-            else:
-                assert RequestStatus.is_finished(req.status)
+            elif RequestStatus.is_finished(req.status):
                self._free_blocks(self.requests[req_id])
+            else:
+                logger.debug(
+                    "finished_recving for %s in status %s (partial remote prefill?)",
+                    req_id, req.status)
        for req_id in kv_connector_output.finished_sending or ():
            logger.debug("Finished sending KV transfer for request %s", req_id)
            if req_id not in self.requests: