Files

Gahow Wang 445e491123 Add vLLM v0.18.1 source tree with KV transfer abort fix

third_party/vllm/ now tracked in git for direct patch management.
Based on vLLM v0.18.1 release with one patch applied:

  vllm/v1/core/sched/scheduler.py:
    Replace fatal assert with graceful skip when KV transfer callback
    arrives for an already-aborted request during PD disaggregated serving.

Future vLLM modifications should be made directly in third_party/vllm/
and committed normally. The patches/ directory is kept as documentation
of what changed from upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-22 00:30:38 +08:00

attention_benchmarks

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

auto_tune

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

cutlass_benchmarks

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

disagg_benchmarks

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

fused_kernels

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

kernels

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

multi_turn

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

overheads

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

backend_request_func.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_batch_invariance.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_block_pool.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_hash.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_latency.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_long_document_qa_throughput.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_ngram_proposer.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_prefix_block_hash.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_prefix_caching.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_prioritization.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_serving_structured_output.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_serving.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_throughput.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_topk_topp.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

benchmark_utils.py

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

README.md

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

run_structured_output_benchmark.sh

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

sonnet.txt

Add vLLM v0.18.1 source tree with KV transfer abort fix

2026-05-22 00:30:38 +08:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage