Fix scheduler assertion crash on partial remote prefill finished_recving
The assertion `assert RequestStatus.is_finished(req.status)` at scheduler.py:2109 fires when a partial-remote-prefill request receives `finished_recving` while in RUNNING state (local prefill already started before RDMA read completed). This was the root cause of 67% error rate: EngineCore crashed with "fatal error" assertion, killing the vLLM instance. Fix: Replace assertion with debug log for non-WAITING, non-finished requests. kv_both no-offload baseline confirmed 0 errors, proving the crash was from our scheduler patch, not kv_both instability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2105,9 +2105,12 @@ class Scheduler(SchedulerInterface):
|
||||
req = self.requests[req_id]
|
||||
if req.status == RequestStatus.WAITING_FOR_REMOTE_KVS:
|
||||
self.finished_recving_kv_req_ids.add(req_id)
|
||||
else:
|
||||
assert RequestStatus.is_finished(req.status)
|
||||
elif RequestStatus.is_finished(req.status):
|
||||
self._free_blocks(self.requests[req_id])
|
||||
else:
|
||||
logger.debug(
|
||||
"finished_recving for %s in status %s (partial remote prefill?)",
|
||||
req_id, req.status)
|
||||
for req_id in kv_connector_output.finished_sending or ():
|
||||
logger.debug("Finished sending KV transfer for request %s", req_id)
|
||||
if req_id not in self.requests:
|
||||
|
||||
Reference in New Issue
Block a user