Gahow Wang 29b901b145 Fix scheduler assertion crash on partial remote prefill finished_recving
The assertion `assert RequestStatus.is_finished(req.status)` at
scheduler.py:2109 fires when a partial-remote-prefill request
receives `finished_recving` while in RUNNING state (local prefill
already started before RDMA read completed).

This was the root cause of 67% error rate: EngineCore crashed with
"fatal error" assertion, killing the vLLM instance.

Fix: Replace assertion with debug log for non-WAITING, non-finished
requests. kv_both no-offload baseline confirmed 0 errors, proving
the crash was from our scheduler patch, not kv_both instability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 23:33:26 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%