Fix scheduler assertion crash on partial remote prefill finished_recving

The assertion `assert RequestStatus.is_finished(req.status)` at
scheduler.py:2109 fires when a partial-remote-prefill request
receives `finished_recving` while in RUNNING state (local prefill
already started before RDMA read completed).

This was the root cause of 67% error rate: EngineCore crashed with
"fatal error" assertion, killing the vLLM instance.

Fix: Replace assertion with debug log for non-WAITING, non-finished
requests. kv_both no-offload baseline confirmed 0 errors, proving
the crash was from our scheduler patch, not kv_both instability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-23 23:33:26 +08:00
parent 4f93bb5b8a
commit 29b901b145

View File

@@ -2105,9 +2105,12 @@ class Scheduler(SchedulerInterface):
req = self.requests[req_id]
if req.status == RequestStatus.WAITING_FOR_REMOTE_KVS:
self.finished_recving_kv_req_ids.add(req_id)
else:
assert RequestStatus.is_finished(req.status)
elif RequestStatus.is_finished(req.status):
self._free_blocks(self.requests[req_id])
else:
logger.debug(
"finished_recving for %s in status %s (partial remote prefill?)",
req_id, req.status)
for req_id in kv_connector_output.finished_sending or ():
logger.debug("Finished sending KV transfer for request %s", req_id)
if req_id not in self.requests: