agentic-kvc

Go to file

Gahow Wang 29b901b145 Fix scheduler assertion crash on partial remote prefill finished_recving

The assertion `assert RequestStatus.is_finished(req.status)` at
scheduler.py:2109 fires when a partial-remote-prefill request
receives `finished_recving` while in RUNNING state (local prefill
already started before RDMA read completed).

This was the root cause of 67% error rate: EngineCore crashed with
"fatal error" assertion, killing the vLLM instance.

Fix: Replace assertion with debug log for non-WAITING, non-finished
requests. kv_both no-offload baseline confirmed 0 errors, proving
the crash was from our scheduler patch, not kv_both instability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-23 23:33:26 +08:00

analysis

Add comprehensive research findings document

2026-05-23 07:16:31 +08:00

experiments

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

patches

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer

replayer: wire --max-inflight-sessions cap into replay loop (B2)