Files
Gahow Wang 445e491123 Add vLLM v0.18.1 source tree with KV transfer abort fix
third_party/vllm/ now tracked in git for direct patch management.
Based on vLLM v0.18.1 release with one patch applied:

  vllm/v1/core/sched/scheduler.py:
    Replace fatal assert with graceful skip when KV transfer callback
    arrives for an already-aborted request during PD disaggregated serving.

Future vLLM modifications should be made directly in third_party/vllm/
and committed normally. The patches/ directory is kept as documentation
of what changed from upstream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 00:30:38 +08:00

9.5 KiB

Committers

This document lists the current committers of the vLLM project and the core areas they maintain. Committers have write access to the vLLM repository and are responsible for reviewing and merging PRs. You can also refer to the CODEOWNERS file for concrete file-level ownership and reviewers. Both this documents and the CODEOWNERS file are living documents and they complement each other.

Active Committers

We try to summarize each committer's role in vLLM in a few words. In general, vLLM committers cover a wide range of areas and help each other in the maintenance process. Please refer to the later section about Area Owners for exact component ownership details. Sorted alphabetically by GitHub handle:

Emeritus Committers

Committers who have contributed to vLLM significantly in the past (thank you!) but no longer active:

Area Owners

This section breaks down the active committers by vLLM components and lists the area owners. If you have PRs touching the area, please feel free to ping the area owner for review.

Engine Core

  • Scheduler: the core vLLM engine loop scheduling requests to next batch
    • @WoosukKwon, @robertgshaw2-redhat, @njhill, @heheda12345
  • KV Cache Manager: memory management layer within scheduler maintaining KV cache logical block data
    • @heheda12345, @WoosukKwon
  • AsyncLLM: the zmq based protocol hosting engine core and making it accessible for entrypoints
    • @robertgshaw2-redhat, @njhill, @russellb
  • ModelRunner, Executor, Worker: the abstractions for engine wrapping model implementation
    • @WoosukKwon, @tlrmchlsmth, @heheda12345, @LucasWilkinson, @ProExpertProg
  • KV Connector: Connector interface and implementation for KV cache offload and transfer
    • @robertgshaw2-redhat, @njhill, @KuntaiDu, @NickLucche, @ApostaC
  • Distributed, Parallelism, Process Management: Process launchers managing each worker, and assign them to the right DP/TP/PP/EP ranks
    • @youkaichao, @njhill, @WoosukKwon, @ruisearch42
  • Collectives: the usage of nccl and other communication libraries/kernels
    • @tlrmchlsmth, @youkaichao
  • Multimodality engine and memory management: core scheduling and memory management concerning vision, audio, and video inputs.
    • @ywang96, @DarkLight1337

Model Implementations

  • Model Interface: The nn.Module interface and implementation for various models
    • @zhuohan123, @mgoin, @simon-mo, @houseroad, @ywang96 (multimodality), @jeejeelee (lora)
  • Logits Processors / Sampler: The provided sampler class and pluggable logits processors
    • @njhill, @houseroad, @22quinn
  • Custom Layers: Utility layers in vLLM such as rotary embedding and rms norms
    • @ProExpertProg
  • Attention: Attention interface for paged attention
    • @WoosukKwon, @LucasWilkinson, @heheda12345
  • FusedMoE: FusedMoE kernel, Modular kernel framework, EPLB
    • @tlrmchlsmth
  • Quantization: Various quantization config, weight loading, and kernel.
    • @mgoin, @Isotr0py, @yewentao256
  • Custom quantized GEMM kernels (cutlass_scaled_mm, marlin, machete)
    • @tlrmchlsmth, @LucasWilkinson
  • Multi-modal Input Processing: Components that load and process image/video/audio data into feature tensors
    • @DarkLight1337, @ywang96, @Isotr0py
  • torch compile: The torch.compile integration in vLLM, custom passes & transformations
    • @ProExpertProg, @zou3519, @youkaichao, @BoyuanFeng
  • State space models: The state space models implementation in vLLM
    • @tdoublep, @tlrmchlsmth
  • Reasoning and tool calling parsers
    • @chaunceyjiang, @aarnphm

Entrypoints

  • LLM Class: The LLM class for offline inference
    • @DarkLight1337
  • API Server: The OpenAI-compatible API server
    • @DarkLight1337, @njhill, @aarnphm, @simon-mo, @heheda12345 (Responses API)
  • Batch Runner: The OpenAI-compatible batch runner
    • @simon-mo

Features

  • Spec Decode: Covers model definition, attention, sampler, and scheduler related to n-grams, EAGLE, and MTP.
    • @WoosukKwon, @benchislett, @luccafong
  • Structured Output: The structured output implementation
    • @russellb, @aarnphm
  • RL: The RL related features such as collective rpc, sleep mode, etc.
    • @youkaichao, @zhuohan123, @22quinn
  • LoRA: @jeejeelee
  • Observability: Metrics and Logging
    • @markmc, @robertgshaw2-redhat, @simon-mo

Code Base

  • Config: Configuration registration and parsing
    • @hmellor
  • Documentation: @hmellor, @DarkLight1337, @simon-mo
  • Benchmarks: @ywang96, @simon-mo
  • CI, Build, Release Process: @khluu, @njhill, @simon-mo
  • Security: @russellb

External Kernels Integration

  • FlashAttention: @LucasWilkinson
  • FlashInfer: @LucasWilkinson, @mgoin, @WoosukKwon
  • Blackwell Kernels: @mgoin, @yewentao256
  • DeepEP/DeepGEMM: @mgoin, @yewentao256

Integrations

  • Hugging Face: @hmellor, @Isotr0py
  • Ray: @ruisearch42
  • NIXL: @robertgshaw2-redhat, @NickLucche

Collaboration with Model Vendors

  • gpt-oss: @heheda12345, @simon-mo, @zhuohan123
  • Llama: @luccafong
  • Qwen: @sighingnow
  • Mistral: @patrickvonplaten

Hardware

  • Plugin Interface: @youkaichao, @Yikun
  • NVIDIA GPU: @pavanimajety
  • AMD GPU: @gshtras, @tjtanaa
  • Intel CPU/GPU: @jikunshang, @bigPYJ1151
  • Google TPU: @yaochengji

Ecosystem Projects