52a54e44afe81332131d0da1bae3af2265817a20
- Replace the global session_affinity dict with two namespace-isolated ones (combined / prefill) so a session_id never indexes the wrong instance list across mode switches. Keep `session_affinity` as a read-only alias to the combined dict for any existing tooling. - Add a startup _verify_vllm_patch() that scans vllm.v1.core.sched.scheduler.Scheduler for the original `assert req_id in self.requests` line. If the patch was not re-applied after a vLLM upgrade we now print a loud warning at lifespan startup instead of dying mid-experiment on a KV-transfer abort race.
Description
No description provided
Languages
Python
82.9%
Shell
17.1%