Files

Gahow Wang 445e491123 Add vLLM v0.18.1 source tree with KV transfer abort fix

third_party/vllm/ now tracked in git for direct patch management.
Based on vLLM v0.18.1 release with one patch applied:

  vllm/v1/core/sched/scheduler.py:
    Replace fatal assert with graceful skip when KV transfer callback
    arrives for an already-aborted request during PD disaggregated serving.

Future vLLM modifications should be made directly in third_party/vllm/
and committed normally. The patches/ directory is kept as documentation
of what changed from upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-22 00:30:38 +08:00

README.md

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

README.md

Examples

vLLM's examples are split into three categories:

If you are using vLLM from within Python code, see the Offline Inference section.
If you are using vLLM from an HTTP application or client, see the Online Serving section.
For examples of using some of vLLM's advanced features (e.g. LMCache or Tensorizer) which are not specific to either of the above use cases, see the Others section.