Gahow Wang 52a54e44af proxy: split session_affinity per mode + vLLM patch self-check (M4, S2)
- Replace the global session_affinity dict with two namespace-isolated
  ones (combined / prefill) so a session_id never indexes the wrong
  instance list across mode switches. Keep `session_affinity` as a
  read-only alias to the combined dict for any existing tooling.
- Add a startup _verify_vllm_patch() that scans
  vllm.v1.core.sched.scheduler.Scheduler for the original
  `assert req_id in self.requests` line. If the patch was not
  re-applied after a vLLM upgrade we now print a loud warning at
  lifespan startup instead of dying mid-experiment on a KV-transfer
  abort race.
2026-05-23 21:12:56 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%