Files

Gahow Wang 445e491123 Add vLLM v0.18.1 source tree with KV transfer abort fix

third_party/vllm/ now tracked in git for direct patch management.
Based on vLLM v0.18.1 release with one patch applied:

  vllm/v1/core/sched/scheduler.py:
    Replace fatal assert with graceful skip when KV transfer callback
    arrives for an already-aborted request during PD disaggregated serving.

Future vLLM modifications should be made directly in third_party/vllm/
and committed normally. The patches/ directory is kept as documentation
of what changed from upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-22 00:30:38 +08:00

9.5 KiB

Raw Permalink Blame History

Committers

This document lists the current committers of the vLLM project and the core areas they maintain. Committers have write access to the vLLM repository and are responsible for reviewing and merging PRs. You can also refer to the CODEOWNERS file for concrete file-level ownership and reviewers. Both this documents and the CODEOWNERS file are living documents and they complement each other.

Active Committers

We try to summarize each committer's role in vLLM in a few words. In general, vLLM committers cover a wide range of areas and help each other in the maintenance process. Please refer to the later section about Area Owners for exact component ownership details. Sorted alphabetically by GitHub handle:

@22quinn: RL API
@aarnphm: Structured output
@alexm-redhat: Performance
@ApostaC: Connectors, offloading
@benchislett: Engine core and spec decode
@bigPYJ1151: Intel CPU/XPU integration
@chaunceyjiang: Tool use and reasoning parser
@DarkLight1337: Multimodality, API server
@esmeetu: developer marketing, community
@gshtras: AMD integration
@heheda12345: Hybrid memory allocator
@hmellor: Hugging Face integration, documentation
@houseroad: Engine core and Llama models
@Isotr0py: Multimodality, new model support
@jeejeelee: LoRA, new model support
@jikunshang: Intel CPU/XPU integration
@khluu: CI infrastructure
@KuntaiDu: KV Connector
@LucasWilkinson: Kernels and performance
@luccafong: Llama models, speculative decoding, distributed
@markmc: Observability
@mgoin: Quantization and performance
@NickLucche: KV connector
@njhill: Distributed, API server, engine core
@noooop: Pooling models
@patrickvonplaten: Mistral models, new model support
@pavanimajety: NVIDIA GPU integration
@ProExpertProg: Compilation, startup UX
@robertgshaw2-redhat: Core, distributed, disagg
@ruisearch42: Pipeline parallelism, Ray Support
@russellb: Structured output, engine core, security
@sighingnow: Qwen models, new model support
@simon-mo: Project lead, API entrypoints, community
@tdoublep: State space models
@tjtanaa: AMD GPU integration
@tlrmchlsmth: Kernels and performance, distributed, disagg
@WoosukKwon: Project lead, engine core
@yaochengji: TPU integration
@yeqcharlotte: Benchmark, Llama models
@yewentao256: Kernels and performance
@Yikun: Pluggable hardware interface
@youkaichao: Project lead, distributed, compile, community
@ywang96: Multimodality, benchmarks
@zhuohan123: Project lead, RL integration, numerics
@zou3519: Compilation
@BoyuanFeng: Compilation, CUDAGraph

Emeritus Committers

Committers who have contributed to vLLM significantly in the past (thank you!) but no longer active:

@andoorve: Pipeline parallelism
@cadedaniel: Speculative decoding
@comaniac: KV cache management, pipeline parallelism
@LiuXiaoxuanPKU: Speculative decoding
@pcmoritz: MoE
@rkooo567: Chunked prefill
@sroy745: Speculative decoding
@Yard1: kernels and performance
@zhisbug: Arctic models, distributed

Area Owners

This section breaks down the active committers by vLLM components and lists the area owners. If you have PRs touching the area, please feel free to ping the area owner for review.

Engine Core

Scheduler: the core vLLM engine loop scheduling requests to next batch
- @WoosukKwon, @robertgshaw2-redhat, @njhill, @heheda12345
KV Cache Manager: memory management layer within scheduler maintaining KV cache logical block data
- @heheda12345, @WoosukKwon
AsyncLLM: the zmq based protocol hosting engine core and making it accessible for entrypoints
- @robertgshaw2-redhat, @njhill, @russellb
ModelRunner, Executor, Worker: the abstractions for engine wrapping model implementation
- @WoosukKwon, @tlrmchlsmth, @heheda12345, @LucasWilkinson, @ProExpertProg
KV Connector: Connector interface and implementation for KV cache offload and transfer
- @robertgshaw2-redhat, @njhill, @KuntaiDu, @NickLucche, @ApostaC
Distributed, Parallelism, Process Management: Process launchers managing each worker, and assign them to the right DP/TP/PP/EP ranks
- @youkaichao, @njhill, @WoosukKwon, @ruisearch42
Collectives: the usage of nccl and other communication libraries/kernels
- @tlrmchlsmth, @youkaichao
Multimodality engine and memory management: core scheduling and memory management concerning vision, audio, and video inputs.
- @ywang96, @DarkLight1337

Model Implementations

Model Interface: The nn.Module interface and implementation for various models
- @zhuohan123, @mgoin, @simon-mo, @houseroad, @ywang96 (multimodality), @jeejeelee (lora)
Logits Processors / Sampler: The provided sampler class and pluggable logits processors
- @njhill, @houseroad, @22quinn
Custom Layers: Utility layers in vLLM such as rotary embedding and rms norms
- @ProExpertProg
Attention: Attention interface for paged attention
- @WoosukKwon, @LucasWilkinson, @heheda12345
FusedMoE: FusedMoE kernel, Modular kernel framework, EPLB
- @tlrmchlsmth
Quantization: Various quantization config, weight loading, and kernel.
- @mgoin, @Isotr0py, @yewentao256
Custom quantized GEMM kernels (cutlass_scaled_mm, marlin, machete)
- @tlrmchlsmth, @LucasWilkinson
Multi-modal Input Processing: Components that load and process image/video/audio data into feature tensors
- @DarkLight1337, @ywang96, @Isotr0py
torch compile: The torch.compile integration in vLLM, custom passes & transformations
- @ProExpertProg, @zou3519, @youkaichao, @BoyuanFeng
State space models: The state space models implementation in vLLM
- @tdoublep, @tlrmchlsmth
Reasoning and tool calling parsers
- @chaunceyjiang, @aarnphm

Entrypoints

LLM Class: The LLM class for offline inference
- @DarkLight1337
API Server: The OpenAI-compatible API server
- @DarkLight1337, @njhill, @aarnphm, @simon-mo, @heheda12345 (Responses API)
Batch Runner: The OpenAI-compatible batch runner
- @simon-mo

Features

Spec Decode: Covers model definition, attention, sampler, and scheduler related to n-grams, EAGLE, and MTP.
- @WoosukKwon, @benchislett, @luccafong
Structured Output: The structured output implementation
- @russellb, @aarnphm
RL: The RL related features such as collective rpc, sleep mode, etc.
- @youkaichao, @zhuohan123, @22quinn
LoRA: @jeejeelee
Observability: Metrics and Logging
- @markmc, @robertgshaw2-redhat, @simon-mo

Code Base

Config: Configuration registration and parsing
- @hmellor
Documentation: @hmellor, @DarkLight1337, @simon-mo
Benchmarks: @ywang96, @simon-mo
CI, Build, Release Process: @khluu, @njhill, @simon-mo
Security: @russellb

External Kernels Integration

FlashAttention: @LucasWilkinson
FlashInfer: @LucasWilkinson, @mgoin, @WoosukKwon
Blackwell Kernels: @mgoin, @yewentao256
DeepEP/DeepGEMM: @mgoin, @yewentao256

Integrations

Hugging Face: @hmellor, @Isotr0py
Ray: @ruisearch42
NIXL: @robertgshaw2-redhat, @NickLucche

Collaboration with Model Vendors

gpt-oss: @heheda12345, @simon-mo, @zhuohan123
Llama: @luccafong
Qwen: @sighingnow
Mistral: @patrickvonplaten

Hardware

Plugin Interface: @youkaichao, @Yikun
NVIDIA GPU: @pavanimajety
AMD GPU: @gshtras, @tjtanaa
Intel CPU/GPU: @jikunshang, @bigPYJ1151
Google TPU: @yaochengji

Ecosystem Projects

Ascend NPU: @wangxiyuan and see more details
Intel Gaudi HPU @xuechendi and @kzawora-intel
Semantic Router: @xunzhuo, @rootfs and see more details

9.5 KiB Raw Permalink Blame History