chore: vendor sglang v0.5.10 snapshot
This commit is contained in:
5
.gitignore
vendored
5
.gitignore
vendored
@@ -12,6 +12,7 @@ src/*.egg-info
|
|||||||
.deps/
|
.deps/
|
||||||
outputs/
|
outputs/
|
||||||
|
|
||||||
# Local heavyweight checkouts and generated experiment artifacts
|
# Vendored dependencies. Track only the maintained SGLang fork/snapshot.
|
||||||
third_party/
|
third_party/*
|
||||||
|
!third_party/sglang/
|
||||||
*.log
|
*.log
|
||||||
|
|||||||
607
third_party/sglang/.claude/skills/add-jit-kernel/SKILL.md
vendored
Normal file
607
third_party/sglang/.claude/skills/add-jit-kernel/SKILL.md
vendored
Normal file
@@ -0,0 +1,607 @@
|
|||||||
|
---
|
||||||
|
name: add-jit-kernel
|
||||||
|
description: Step-by-step tutorial for adding a new lightweight JIT CUDA kernel to sglang's jit_kernel module
|
||||||
|
---
|
||||||
|
|
||||||
|
# Tutorial: Adding a New JIT Kernel to SGLang
|
||||||
|
|
||||||
|
This tutorial walks through adding a simple element-wise scale operation as a JIT kernel. We'll implement `scale(x, factor) = x * factor` to demonstrate the complete workflow.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Add a new operation that scales each element of a tensor by a scalar factor:
|
||||||
|
|
||||||
|
- Input: tensor `x` (CUDA) and scalar `factor` (float, passed at runtime)
|
||||||
|
- Output: `x * factor` (element-wise), allocated internally
|
||||||
|
- Supported dtypes: **FP16 (`torch.float16`), BF16 (`torch.bfloat16`), FP32 (`torch.float32`)**
|
||||||
|
|
||||||
|
## When to use JIT vs AOT (`sgl-kernel`)
|
||||||
|
|
||||||
|
- **JIT (`jit_kernel`)**: prefer this first for kernels that do **not** depend on CUTLASS or another large C++ project. It is the default choice for lightweight kernels that benefit from rapid iteration and first-use compilation.
|
||||||
|
- **AOT (`sgl-kernel`)**: prefer this when the kernel **does** depend on CUTLASS or another large C++ project, or when it should live in `sgl-kernel/` and participate in the wheel build / torch op registration flow.
|
||||||
|
- **Exception**: kernels that depend on `flashinfer`, or on CUTLASS that is already provided through `flashinfer`, can still be implemented as `jit_kernel`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common Abstractions in `python/sglang/jit_kernel/include/sgl_kernel/`
|
||||||
|
|
||||||
|
**Always prefer these abstractions over raw CUDA primitives.** They provide safety, readability, and consistency with the rest of the codebase.
|
||||||
|
|
||||||
|
**Important include rule:** for every `#include <sgl_kernel/...>` line, add a short trailing comment explaining why that header is included (for example `// For TensorMatcher, SymbolicSize, SymbolicDevice`). This matches the current JIT kernel style and keeps include usage self-documenting.
|
||||||
|
|
||||||
|
### `utils.h` — Host-side utilities
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/utils.h>
|
||||||
|
```
|
||||||
|
|
||||||
|
- **`host::RuntimeCheck(cond, args...)`** — Assert a condition at runtime; throws `PanicError` with file/line info on failure. Prefer this over bare `assert`.
|
||||||
|
- **`host::Panic(args...)`** — Unconditionally throw a `PanicError` with a descriptive message.
|
||||||
|
- **`host::div_ceil(a, b)`** — Integer ceiling division `(a + b - 1) / b`.
|
||||||
|
- **`host::irange(n)`** / **`host::irange(start, end)`** — Range views for cleaner loops.
|
||||||
|
- **`host::pointer::offset(ptr, offsets...)`** — Byte-safe pointer arithmetic on `void*`. Use this instead of raw casts.
|
||||||
|
|
||||||
|
### `utils.cuh` — Device-side utilities + `LaunchKernel`
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/utils.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Type aliases**: `fp16_t`, `bf16_t`, `fp32_t`, `fp8_e4m3_t`, `fp8_e5m2_t` and their packed variants `fp16x2_t`, `bf16x2_t`, `fp32x2_t`, etc.
|
||||||
|
- **`SGL_DEVICE`** — Expands to `__forceinline__ __device__`. Use on all device functions.
|
||||||
|
- **`device::kWarpThreads`** — Constant `32`.
|
||||||
|
- **`device::load_as<T>(ptr, offset)`** / **`device::store_as<T>(ptr, val, offset)`** — Type-safe loads/stores from `void*`.
|
||||||
|
- **`device::pointer::offset(ptr, offsets...)`** — Pointer arithmetic on device.
|
||||||
|
- **`host::LaunchKernel(grid, block, device_or_stream [, smem])`** — RAII kernel launcher that:
|
||||||
|
- Resolves the CUDA stream from a `DLDevice` via TVM-FFI automatically.
|
||||||
|
- Checks the CUDA error with file/line info after launch via `operator()(kernel, args...)`.
|
||||||
|
- Supports `.enable_pdl(bool)` for PDL (Programmatic Dependent Launch, SM90+).
|
||||||
|
- **`host::RuntimeDeviceCheck(cudaError_t)`** — Check a CUDA error; throw on failure.
|
||||||
|
|
||||||
|
### `tensor.h` — Tensor validation (`TensorMatcher`, Symbolic types)
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/tensor.h>
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the **primary validation API** for all kernel launchers. Use it to validate every `tvm::ffi::TensorView` argument.
|
||||||
|
|
||||||
|
- **`host::SymbolicSize{"name"}`** — A named symbolic dimension. Call `.set_value(n)` to pin it, `.unwrap()` to extract after verification.
|
||||||
|
- **`host::SymbolicDType`** — Symbolic dtype. Use `.set_options<Ts...>()` to restrict allowed types.
|
||||||
|
- **`host::SymbolicDevice`** — Symbolic device. Use `.set_options<kDLCUDA>()` to restrict to CUDA.
|
||||||
|
- **`host::TensorMatcher({dims...})`** — Fluent builder for tensor validation:
|
||||||
|
- `.with_dtype<T>()` — require a specific C++ type (e.g. `fp16_t`)
|
||||||
|
- `.with_dtype<T1, T2, ...>()` — allow a set of types
|
||||||
|
- `.with_device<kDLCUDA>(device_sym)` — require CUDA and bind the checked device to a `SymbolicDevice`
|
||||||
|
- `.with_strides({strides...})` — validate strides (omit to require contiguous)
|
||||||
|
- `.verify(tensor_view)` — execute the check; throws `PanicError` with full context on failure; **chainable** (`verify(a).verify(b)` to check multiple tensors with the same shape)
|
||||||
|
|
||||||
|
**Typical pattern:**
|
||||||
|
```cpp
|
||||||
|
auto N = SymbolicSize{"num_elements"};
|
||||||
|
auto device = SymbolicDevice{};
|
||||||
|
device.set_options<kDLCUDA>();
|
||||||
|
TensorMatcher({N}) //
|
||||||
|
.with_dtype<fp16_t>()
|
||||||
|
.with_device<kDLCUDA>(device)
|
||||||
|
.verify(dst)
|
||||||
|
.verify(src); // same shape, dtype, device as dst
|
||||||
|
const size_t n = N.unwrap();
|
||||||
|
const DLDevice dev = device.unwrap();
|
||||||
|
```
|
||||||
|
|
||||||
|
### `type.cuh` — `dtype_trait<T>` and `packed_t<T>`
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/type.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- **`dtype_trait<T>`** — Static trait struct for each scalar type. Provides:
|
||||||
|
- `dtype_trait<T>::from(value)` — convert from another type (e.g. `fp32_t` → `fp16_t`)
|
||||||
|
- `dtype_trait<T>::abs/sqrt/rsqrt/exp/sin/cos(x)` — type-dispatched unary math (primarily for `fp32_t`)
|
||||||
|
- `dtype_trait<T>::max/min(x, y)` — type-dispatched binary math (primarily for `fp32_t`)
|
||||||
|
- **`packed_t<T>`** — Two-element packed alias: `packed_t<fp16_t>` = `fp16x2_t`, `packed_t<bf16_t>` = `bf16x2_t`, `packed_t<fp32_t>` = `fp32x2_t`. Use for vectorized loads/stores.
|
||||||
|
- **`device::cast<To, From>(value)`** — Type-safe cast using `dtype_trait`, e.g. `cast<fp32x2_t, fp16x2_t>(v)`.
|
||||||
|
|
||||||
|
### `vec.cuh` — Vectorized memory access (`AlignedVector`)
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/vec.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- **`device::AlignedVector<T, N>`** — Aligned storage for N elements of type T. N must be a power of two, `sizeof(T)*N <= 32`. Enables vectorized loads/stores for bandwidth efficiency. In terms of API/codegen constraints, the upper bound is 256-bit; in practice, 128-bit is the portable default, while 256-bit vectorization is typically only viable on `SM100+` and should be gated by an architecture check when needed.
|
||||||
|
- `.load(ptr, offset)` — vectorized load from `ptr[offset]`
|
||||||
|
- `.store(ptr, offset)` — vectorized store to `ptr[offset]`
|
||||||
|
- `.fill(value)` — fill all lanes
|
||||||
|
- `operator[](i)` — element access
|
||||||
|
|
||||||
|
### `tile.cuh` — `tile::Memory` (strided memory access pattern)
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/tile.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- `tile::Memory<T>` is fundamentally a **1D cooperative accessor** over a contiguous region.
|
||||||
|
- **`device::tile::Memory<T>::cta(blockDim.x)`** — Creates a tile accessor where each thread handles `tid = threadIdx.x` with stride `tsize` (for `cta(blockDim.x)`, this is `blockDim.x`). Common for loops over a 1D array.
|
||||||
|
- **`.load(ptr, offset)`** — loads `ptr[tid + offset * tsize]`
|
||||||
|
- **`.store(ptr, val, offset)`** — stores to `ptr[tid + offset * tsize]`
|
||||||
|
- **`.in_bound(n, offset)`** — boundary check
|
||||||
|
|
||||||
|
For a **2D tile**, either flatten `(row, col)` into a linear tile index first, or compute the address manually with `ptr[row * stride + col]` using your thread/block coordinates.
|
||||||
|
|
||||||
|
### `math.cuh` — Device math (`device::math::`)
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/math.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- `device::math::max/min<T>(a, b)` — type-dispatched binary math via `dtype_trait`
|
||||||
|
- `device::math::abs/sqrt/rsqrt/exp/sin/cos<T>(x)` — type-dispatched unary math via `dtype_trait`
|
||||||
|
|
||||||
|
### `warp.cuh` — Warp-level primitives
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/warp.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- `device::warp::reduce_sum<T>(value)` — warp-level sum reduction via `__shfl_xor_sync`
|
||||||
|
- `device::warp::reduce_max<T>(value)` — warp-level max reduction
|
||||||
|
|
||||||
|
### `cta.cuh` — CTA-level primitives
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/cta.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- `device::cta::reduce_max<T>(value, smem, min_value)` — CTA-wide max using shared memory + warp reduction. Caller is responsible for a `__syncthreads()` after if the result in `smem[0]` is needed.
|
||||||
|
|
||||||
|
### `atomic.cuh` — Atomic operations
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/atomic.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- `device::atomic::max(float* addr, float value)` — float atomic max (handles negative values correctly via bit tricks).
|
||||||
|
|
||||||
|
### `runtime.cuh` — Occupancy and device info
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/runtime.cuh>
|
||||||
|
```
|
||||||
|
|
||||||
|
- `host::runtime::get_blocks_per_sm(kernel, block_dim)` — max active blocks per SM (occupancy)
|
||||||
|
- `host::runtime::get_sm_count(device_id)` — number of SMs on the device
|
||||||
|
- `host::runtime::get_cc_major(device_id)` — compute capability major version
|
||||||
|
|
||||||
|
**Persistent kernel pattern** (cap blocks to SM count × occupancy):
|
||||||
|
```cpp
|
||||||
|
static const uint32_t max_occ = runtime::get_blocks_per_sm(kernel, kBlockSize);
|
||||||
|
static const uint32_t num_sm = runtime::get_sm_count(device.unwrap().device_id);
|
||||||
|
const auto num_blocks = std::min(num_sm * max_occ, div_ceil(n, kBlockSize));
|
||||||
|
LaunchKernel(num_blocks, kBlockSize, device.unwrap())(kernel, params);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 0 (optional): Generate a `.clangd` config for better IDE support
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m sglang.jit_kernel
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Implement the CUDA kernel in `jit_kernel/csrc/`
|
||||||
|
|
||||||
|
Create `python/sglang/jit_kernel/csrc/elementwise/scale.cuh`.
|
||||||
|
|
||||||
|
The implementation fully uses the project abstractions described above:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <sgl_kernel/tensor.h> // For TensorMatcher, SymbolicSize, SymbolicDevice
|
||||||
|
#include <sgl_kernel/type.cuh> // For dtype_trait, fp16_t, bf16_t, fp32_t
|
||||||
|
#include <sgl_kernel/utils.h> // For RuntimeCheck, div_ceil
|
||||||
|
#include <sgl_kernel/utils.cuh> // For LaunchKernel, SGL_DEVICE
|
||||||
|
#include <sgl_kernel/vec.cuh> // For AlignedVector
|
||||||
|
|
||||||
|
#include <dlpack/dlpack.h>
|
||||||
|
#include <tvm/ffi/container/tensor.h>
|
||||||
|
|
||||||
|
namespace {
|
||||||
|
|
||||||
|
// ----------------------------------------------------------------
|
||||||
|
// Kernel: element-wise scale using vectorized 128-bit loads/stores
|
||||||
|
// T = fp16_t | bf16_t | fp32_t
|
||||||
|
// kVecN = number of elements per vector load (e.g. 8 for fp16)
|
||||||
|
// factor = runtime scale factor
|
||||||
|
// ----------------------------------------------------------------
|
||||||
|
template <typename T, int kVecN>
|
||||||
|
__global__ void scale_kernel(T* __restrict__ dst,
|
||||||
|
const T* __restrict__ src,
|
||||||
|
float factor,
|
||||||
|
uint32_t n_total) {
|
||||||
|
using vec_t = device::AlignedVector<T, kVecN>;
|
||||||
|
const uint32_t n_vecs = n_total / kVecN;
|
||||||
|
|
||||||
|
// --- vectorised body ---
|
||||||
|
const uint32_t vec_stride = blockDim.x * gridDim.x;
|
||||||
|
for (uint32_t vi = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
|
vi < n_vecs;
|
||||||
|
vi += vec_stride) {
|
||||||
|
vec_t v;
|
||||||
|
v.load(src, vi);
|
||||||
|
#pragma unroll
|
||||||
|
for (int i = 0; i < kVecN; ++i) {
|
||||||
|
v[i] = static_cast<T>(static_cast<float>(v[i]) * factor);
|
||||||
|
}
|
||||||
|
v.store(dst, vi);
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- scalar tail ---
|
||||||
|
const uint32_t base = n_vecs * kVecN;
|
||||||
|
const uint32_t scalar_stride = blockDim.x * gridDim.x;
|
||||||
|
for (uint32_t i = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
|
base + i < n_total;
|
||||||
|
i += scalar_stride) {
|
||||||
|
dst[base + i] = static_cast<T>(static_cast<float>(src[base + i]) * factor);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ----------------------------------------------------------------
|
||||||
|
// Launcher: validates tensors, selects vector width, launches kernel
|
||||||
|
// ----------------------------------------------------------------
|
||||||
|
template <typename T>
|
||||||
|
void scale(tvm::ffi::TensorView dst, tvm::ffi::TensorView src, float factor) {
|
||||||
|
using namespace host;
|
||||||
|
|
||||||
|
// 1. Validate input tensors with TensorMatcher
|
||||||
|
SymbolicSize N = {"num_elements"};
|
||||||
|
SymbolicDevice device_;
|
||||||
|
device_.set_options<kDLCUDA>();
|
||||||
|
|
||||||
|
TensorMatcher({N}) //
|
||||||
|
.with_dtype<T>()
|
||||||
|
.with_device<kDLCUDA>(device_)
|
||||||
|
.verify(dst)
|
||||||
|
.verify(src); // same shape / dtype / device as dst
|
||||||
|
|
||||||
|
const uint32_t n = static_cast<uint32_t>(N.unwrap());
|
||||||
|
const DLDevice device = device_.unwrap();
|
||||||
|
|
||||||
|
RuntimeCheck(n > 0, "scale: num_elements must be > 0, got ", n);
|
||||||
|
|
||||||
|
// 2. Choose vector width for 128-bit loads (16 bytes)
|
||||||
|
// fp16/bf16: 8 elements × 2 bytes = 16 bytes
|
||||||
|
// fp32: 4 elements × 4 bytes = 16 bytes
|
||||||
|
constexpr int kVecN = 16 / sizeof(T);
|
||||||
|
const uint32_t n_work_items = div_ceil(n, static_cast<uint32_t>(kVecN));
|
||||||
|
|
||||||
|
// 3. Launch
|
||||||
|
constexpr uint32_t kBlockSize = 256;
|
||||||
|
const uint32_t grid = div_ceil(n_work_items, kBlockSize);
|
||||||
|
|
||||||
|
LaunchKernel(grid, kBlockSize, device)(
|
||||||
|
scale_kernel<T, kVecN>,
|
||||||
|
static_cast<T*>(dst.data_ptr()),
|
||||||
|
static_cast<const T*>(src.data_ptr()),
|
||||||
|
factor,
|
||||||
|
n);
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key points:**
|
||||||
|
|
||||||
|
- Include headers from `sgl_kernel/` — **not** raw CUDA headers for anything already covered
|
||||||
|
- Add a short trailing `// For ...` explanation to every `#include <sgl_kernel/...>` line
|
||||||
|
- Use `TensorMatcher` for all tensor validation; never manually check shape/dtype/device
|
||||||
|
- Use `AlignedVector` for vectorised 128-bit loads/stores — significant bandwidth win
|
||||||
|
- Use `LaunchKernel` — it resolves the stream and checks errors automatically
|
||||||
|
- Use `RuntimeCheck` for runtime assertions with useful error messages
|
||||||
|
- Prefer passing runtime scalars like `factor` directly unless compile-time specialisation is genuinely required
|
||||||
|
- `fp16_t` / `bf16_t` / `fp32_t` are the project's type aliases (from `utils.cuh`)
|
||||||
|
- `device::cast<To, From>` or `dtype_trait<T>::from(val)` for cross-type conversions
|
||||||
|
- `device::math::` functions for device math instead of bare `__` intrinsics
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: Add the Python wrapper in `jit_kernel/`
|
||||||
|
|
||||||
|
Create `python/sglang/jit_kernel/scale.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import TYPE_CHECKING
|
||||||
|
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from sglang.jit_kernel.utils import cache_once, load_jit, make_cpp_args
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from tvm_ffi.module import Module
|
||||||
|
|
||||||
|
|
||||||
|
@cache_once
|
||||||
|
def _jit_scale_module(dtype: torch.dtype) -> Module:
|
||||||
|
"""Compile and cache the JIT scale module for a given dtype."""
|
||||||
|
args = make_cpp_args(dtype)
|
||||||
|
return load_jit(
|
||||||
|
"scale",
|
||||||
|
*args,
|
||||||
|
cuda_files=["elementwise/scale.cuh"],
|
||||||
|
cuda_wrappers=[("scale", f"scale<{args}>")],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def scale(src: torch.Tensor, factor: float, out: torch.Tensor | None = None) -> torch.Tensor:
|
||||||
|
"""
|
||||||
|
Element-wise scale: dst = src * factor.
|
||||||
|
|
||||||
|
Supported dtypes: torch.float16, torch.bfloat16, torch.float32.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
src : CUDA tensor (FP16 / BF16 / FP32)
|
||||||
|
factor : scale factor
|
||||||
|
out : optional pre-allocated output tensor (same shape/dtype as src)
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
Scaled tensor (dst = src * factor).
|
||||||
|
"""
|
||||||
|
if not src.is_cuda:
|
||||||
|
raise RuntimeError("src must be a CUDA tensor")
|
||||||
|
if src.dtype not in (torch.float16, torch.bfloat16, torch.float32):
|
||||||
|
raise RuntimeError(
|
||||||
|
f"Unsupported dtype {src.dtype}. Supported: float16, bfloat16, float32"
|
||||||
|
)
|
||||||
|
if out is None:
|
||||||
|
out = torch.empty_like(src)
|
||||||
|
else:
|
||||||
|
if out.shape != src.shape:
|
||||||
|
raise RuntimeError("out shape must match src")
|
||||||
|
if out.dtype != src.dtype:
|
||||||
|
raise RuntimeError("out dtype must match src")
|
||||||
|
if out.device != src.device:
|
||||||
|
raise RuntimeError("out device must match src")
|
||||||
|
|
||||||
|
# Keep the Python wrapper thin, but still enforce the basic preconditions
|
||||||
|
# that the current JIT/FFI path does not reject safely on its own.
|
||||||
|
module = _jit_scale_module(src.dtype)
|
||||||
|
module.scale(out, src, factor)
|
||||||
|
return out
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key points:**
|
||||||
|
|
||||||
|
- Use `cache_once` — **not** `functools.lru_cache` (incompatible with `torch.compile`)
|
||||||
|
- `load_jit` first arg(s) form the unique build marker; same marker = same cached binary
|
||||||
|
- Only include compile-time specialisation knobs in the build marker; runtime values like `factor` should stay runtime unless the kernel truly needs templating
|
||||||
|
- `cuda_wrappers`: `(export_name, kernel_symbol)` — `export_name` is called from Python
|
||||||
|
- `make_cpp_args(dtype, ...)` converts `torch.dtype` to C++ type alias:
|
||||||
|
- Keep Python launchers thin, but still validate the basic invariants (`is_cuda`, supported dtype, `out` metadata). In the current JIT/FFI path, invalid tensors are not always rejected safely before launch
|
||||||
|
|
||||||
|
| `torch.dtype` | C++ type |
|
||||||
|
|--------------------|------------|
|
||||||
|
| `torch.float16` | `fp16_t` |
|
||||||
|
| `torch.bfloat16` | `bf16_t` |
|
||||||
|
| `torch.float32` | `fp32_t` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 (optional): Tune JIT build flags
|
||||||
|
|
||||||
|
```python
|
||||||
|
return load_jit(
|
||||||
|
"scale",
|
||||||
|
*args,
|
||||||
|
cuda_files=["elementwise/scale.cuh"],
|
||||||
|
cuda_wrappers=[("scale", f"scale<{args}>")],
|
||||||
|
extra_cuda_cflags=["-O3", "--use_fast_math"],
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
If your kernel requires SM90+, raise a clear Python error before calling `load_jit`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
if torch.cuda.get_device_capability()[0] < 9:
|
||||||
|
raise RuntimeError("This kernel requires SM90 (Hopper) or later")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4: Write tests (required)
|
||||||
|
|
||||||
|
JIT kernel tests live under `python/sglang/jit_kernel/tests/`. **CI does not run `pytest` in that directory directly.** The unified runner `test/run_suite.py` discovers every `test_*.py` there (and every `bench_*.py` under `benchmark/`), collects `register_*_ci(...)` calls by **statically parsing each file’s AST**, and executes the selected suite. Every test file must register at least one CUDA entry or the collector fails its sanity check.
|
||||||
|
|
||||||
|
- **PR / per-commit CUDA suites** (see `test/run_suite.py` → `PER_COMMIT_SUITES`): JIT unit tests use `stage-b-kernel-unit-1-gpu-large` (see `.github/workflows/pr-test-jit-kernel.yml`: `python3 run_suite.py --hw cuda --suite stage-b-kernel-unit-1-gpu-large`).
|
||||||
|
- **Nightly kernel suite**: `nightly-kernel-1-gpu` with `--nightly` — typically used with `SGLANG_JIT_KERNEL_RUN_FULL_TESTS=1` in CI for expanded parameter grids (see `python/sglang/jit_kernel/utils.py` → `should_run_full_tests` / `get_ci_test_range`). Wired in `.github/workflows/nightly-test-nvidia.yml` (e.g. `python3 run_suite.py --hw cuda --suite nightly-kernel-1-gpu --nightly --continue-on-error`).
|
||||||
|
|
||||||
|
Registration pattern (module level, **literal** `est_time` and `suite` strings — required for AST parsing):
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sglang.test.ci.ci_register import register_cuda_ci
|
||||||
|
|
||||||
|
register_cuda_ci(est_time=30, suite="stage-b-kernel-unit-1-gpu-large")
|
||||||
|
# Optional second registration: same file also listed under the nightly kernel suite
|
||||||
|
# register_cuda_ci(est_time=120, suite="nightly-kernel-1-gpu", nightly=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Keep `est_time` and `suite` as literal values. `run_suite.py` collects them from the file AST, so computed values and helper wrappers can break CI discovery.
|
||||||
|
|
||||||
|
Use `register_cuda_ci(..., disabled="reason")` if the file must stay in-tree but should be skipped in CI (e.g. multi-GPU only).
|
||||||
|
|
||||||
|
**Run like CI** (from repo root):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd test && python3 run_suite.py --hw cuda --suite stage-b-kernel-unit-1-gpu-large
|
||||||
|
```
|
||||||
|
|
||||||
|
For fast iteration you can still run `pytest` on a single file locally; CI coverage is via `run_suite.py`.
|
||||||
|
|
||||||
|
Create `python/sglang/jit_kernel/tests/test_scale.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pytest
|
||||||
|
import torch
|
||||||
|
from sglang.jit_kernel.scale import scale
|
||||||
|
from sglang.test.ci.ci_register import register_cuda_ci
|
||||||
|
|
||||||
|
register_cuda_ci(est_time=30, suite="stage-b-kernel-unit-1-gpu-large")
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("dtype", [torch.float16, torch.bfloat16, torch.float32])
|
||||||
|
@pytest.mark.parametrize("size", [1, 127, 128, 1024, 4097]) # cover tail remainder
|
||||||
|
@pytest.mark.parametrize("factor", [0.5, 1.0, 2.0, 3.0])
|
||||||
|
def test_scale_correctness(dtype, size, factor):
|
||||||
|
src = torch.randn(size, dtype=dtype, device="cuda")
|
||||||
|
out = scale(src, factor)
|
||||||
|
expected = src * factor
|
||||||
|
|
||||||
|
rtol, atol = (1e-5, 1e-6) if dtype == torch.float32 else (1e-2, 1e-2)
|
||||||
|
torch.testing.assert_close(out, expected, rtol=rtol, atol=atol)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("dtype", [torch.float16, torch.bfloat16, torch.float32])
|
||||||
|
def test_scale_out_param(dtype):
|
||||||
|
src = torch.randn(1024, dtype=dtype, device="cuda")
|
||||||
|
out = torch.empty_like(src)
|
||||||
|
result = scale(src, 2.0, out=out)
|
||||||
|
assert result is out
|
||||||
|
torch.testing.assert_close(out, src * 2.0, rtol=1e-2, atol=1e-2)
|
||||||
|
|
||||||
|
|
||||||
|
def test_scale_cpu_error():
|
||||||
|
src = torch.randn(128, dtype=torch.float16) # CPU tensor
|
||||||
|
with pytest.raises(RuntimeError, match="CUDA"):
|
||||||
|
scale(src, 2.0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_scale_unsupported_dtype():
|
||||||
|
src = torch.randint(0, 10, (128,), dtype=torch.int32, device="cuda")
|
||||||
|
with pytest.raises(RuntimeError, match="dtype"):
|
||||||
|
scale(src, 2.0)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
sys.exit(pytest.main([__file__, "-v", "-s"]))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5: Add a benchmark (required)
|
||||||
|
|
||||||
|
Benchmarks are `bench_*.py` files under `python/sglang/jit_kernel/benchmark/`. They are picked up by the same `run_suite.py` machinery as unit tests. Register them for **`stage-b-kernel-benchmark-1-gpu-large`** (PR JIT benchmark job: `python3 run_suite.py --hw cuda --suite stage-b-kernel-benchmark-1-gpu-large`).
|
||||||
|
|
||||||
|
Create `python/sglang/jit_kernel/benchmark/bench_scale.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import itertools
|
||||||
|
|
||||||
|
import torch
|
||||||
|
import triton
|
||||||
|
import triton.testing
|
||||||
|
|
||||||
|
from sglang.jit_kernel.benchmark.utils import (
|
||||||
|
DEFAULT_DEVICE,
|
||||||
|
DEFAULT_DTYPE,
|
||||||
|
get_benchmark_range,
|
||||||
|
run_benchmark,
|
||||||
|
)
|
||||||
|
from sglang.jit_kernel.scale import scale as jit_scale
|
||||||
|
from sglang.test.ci.ci_register import register_cuda_ci
|
||||||
|
|
||||||
|
register_cuda_ci(est_time=6, suite="stage-b-kernel-benchmark-1-gpu-large")
|
||||||
|
|
||||||
|
SIZE_LIST = get_benchmark_range(
|
||||||
|
full_range=[2**n for n in range(10, 20)], # 1K … 512K elements
|
||||||
|
ci_range=[4096, 65536],
|
||||||
|
)
|
||||||
|
|
||||||
|
configs = list(itertools.product(SIZE_LIST))
|
||||||
|
|
||||||
|
|
||||||
|
@triton.testing.perf_report(
|
||||||
|
triton.testing.Benchmark(
|
||||||
|
x_names=["size"],
|
||||||
|
x_vals=configs,
|
||||||
|
line_arg="provider",
|
||||||
|
line_vals=["jit", "torch"],
|
||||||
|
line_names=["SGL JIT Kernel", "PyTorch"],
|
||||||
|
styles=[("blue", "-"), ("red", "--")],
|
||||||
|
ylabel="us",
|
||||||
|
plot_name="scale-performance",
|
||||||
|
args={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
def benchmark(size: int, provider: str):
|
||||||
|
src = torch.randn(size, dtype=DEFAULT_DTYPE, device=DEFAULT_DEVICE)
|
||||||
|
factor = 2.0
|
||||||
|
|
||||||
|
if provider == "jit":
|
||||||
|
fn = lambda: jit_scale(src, factor)
|
||||||
|
else:
|
||||||
|
fn = lambda: src * factor
|
||||||
|
|
||||||
|
return run_benchmark(fn)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
benchmark.run(print_data=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Run locally:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python python/sglang/jit_kernel/benchmark/bench_scale.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Run the benchmark suite the way CI does:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd test && python3 run_suite.py --hw cuda --suite stage-b-kernel-benchmark-1-gpu-large
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
- **`No CI registry found in ...` from `run_suite.py`**: add a module-level `register_cuda_ci(...)` with literal `est_time` and `suite` (and optional `nightly=True`); starred args and non-literal values break AST collection
|
||||||
|
- **JIT compilation fails**: ensure the `.cuh` file is under `python/sglang/jit_kernel/csrc/`; reduce template argument combinations
|
||||||
|
- **CUDA crash / illegal memory access**: `CUDA_LAUNCH_BLOCKING=1`; `compute-sanitizer --tool memcheck python ...`
|
||||||
|
- **Unstable benchmark results**: `run_benchmark` uses CUDA-graph-based timing by default
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `docs/developer_guide/development_jit_kernel_guide.md`
|
||||||
|
- `test/run_suite.py` — suite names, discovery of `jit_kernel/tests/` and `jit_kernel/benchmark/`, execution entrypoint for CI
|
||||||
|
- `python/sglang/test/ci/ci_register.py` — `register_cuda_ci` and AST registration rules
|
||||||
|
- `python/sglang/jit_kernel/utils.py` — `cache_once`, `load_jit`, `make_cpp_args`, `should_run_full_tests`, `get_ci_test_range`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/tensor.h` — `TensorMatcher`, `SymbolicSize/DType/Device`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/utils.cuh` — type aliases, `LaunchKernel`, `SGL_DEVICE`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/vec.cuh` — `AlignedVector`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/tile.cuh` — `tile::Memory`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/type.cuh` — `dtype_trait`, `packed_t`, `device::cast`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/math.cuh` — `device::math::`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/warp.cuh` — `warp::reduce_sum/max`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/cta.cuh` — `cta::reduce_max`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/atomic.cuh` — `atomic::max`
|
||||||
|
- `python/sglang/jit_kernel/include/sgl_kernel/runtime.cuh` — occupancy / SM count helpers
|
||||||
|
- `python/sglang/jit_kernel/csrc/add_constant.cuh` — minimal runnable reference
|
||||||
|
- `python/sglang/jit_kernel/csrc/elementwise/rmsnorm.cuh` — real example using `TensorMatcher` + `LaunchKernel` + `tile::Memory`
|
||||||
|
- `python/sglang/jit_kernel/csrc/elementwise/qknorm.cuh` — real example using `runtime::get_blocks_per_sm` + persistent kernel pattern
|
||||||
|
- `python/sglang/jit_kernel/benchmark/utils.py` — benchmark helpers
|
||||||
|
|
||||||
|
## Summary of Files Created
|
||||||
|
|
||||||
|
```
|
||||||
|
python/sglang/jit_kernel/csrc/elementwise/scale.cuh # NEW: CUDA kernel
|
||||||
|
python/sglang/jit_kernel/scale.py # NEW: Python wrapper
|
||||||
|
python/sglang/jit_kernel/tests/test_scale.py # NEW: Tests
|
||||||
|
python/sglang/jit_kernel/benchmark/bench_scale.py # NEW: Benchmark
|
||||||
|
```
|
||||||
363
third_party/sglang/.claude/skills/add-sgl-kernel/SKILL.md
vendored
Normal file
363
third_party/sglang/.claude/skills/add-sgl-kernel/SKILL.md
vendored
Normal file
@@ -0,0 +1,363 @@
|
|||||||
|
---
|
||||||
|
name: add-sgl-kernel
|
||||||
|
description: Step-by-step tutorial for adding a heavyweight AOT CUDA/C++ kernel to sgl-kernel (including tests & benchmarks)
|
||||||
|
---
|
||||||
|
|
||||||
|
# Tutorial: Adding a New Kernel to `sgl-kernel` (AOT / Heavyweight)
|
||||||
|
|
||||||
|
This tutorial walks through adding a simple element-wise scale operation as an AOT kernel. We'll implement `scale(x, factor) = x * factor` to demonstrate the complete workflow.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Add a new operation that scales each element of a tensor by a scalar factor:
|
||||||
|
|
||||||
|
- Input: tensor `x` (CUDA) and scalar `factor` (float)
|
||||||
|
- Output: `x * factor` (element-wise, in-place or into pre-allocated `out`)
|
||||||
|
- Supported dtypes: **FP16 (`torch.float16`), BF16 (`torch.bfloat16`), FP32 (`torch.float32`)**
|
||||||
|
- Dispatched via `DISPATCH_PYTORCH_DTYPE_TO_CTYPE_FLOAT_FP16` macro (defined in `sgl-kernel/include/utils.h`)
|
||||||
|
|
||||||
|
## Two rules of thumb (must follow)
|
||||||
|
|
||||||
|
1. **Prefer `python/sglang/jit_kernel` first** when the kernel does **not** depend on CUTLASS or another large C++ project. This is the default path for lightweight kernels that benefit from rapid iteration.
|
||||||
|
2. **Prefer `sgl-kernel`** when the kernel **does** depend on CUTLASS or another large C++ project, or when it should be part of the AOT wheel / torch op registration flow.
|
||||||
|
3. **Exception**: if the dependency is `flashinfer`, or CUTLASS that is already provided through `flashinfer`, the kernel can still be implemented as `jit_kernel`.
|
||||||
|
|
||||||
|
In addition, every new kernel must ship with:
|
||||||
|
|
||||||
|
- **Tests** (pytest)
|
||||||
|
- **A benchmark script** (triton.testing)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository integration map
|
||||||
|
|
||||||
|
You will typically touch these files/areas:
|
||||||
|
|
||||||
|
- Implementation: `sgl-kernel/csrc/elementwise/scale.cu` (pick the right subdirectory)
|
||||||
|
- Public declarations: `sgl-kernel/include/sgl_kernel_ops.h`
|
||||||
|
- Torch extension registration: `sgl-kernel/csrc/common_extension.cc`
|
||||||
|
- Build: `sgl-kernel/CMakeLists.txt` (`set(SOURCES ...)`)
|
||||||
|
- Python API: `sgl-kernel/python/sgl_kernel/` and `sgl-kernel/python/sgl_kernel/__init__.py`
|
||||||
|
- Tests: `sgl-kernel/tests/test_scale.py`
|
||||||
|
- Benchmarks: `sgl-kernel/benchmark/bench_scale.py`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Implement the kernel in `csrc/`
|
||||||
|
|
||||||
|
Pick the right subdirectory:
|
||||||
|
|
||||||
|
- `csrc/elementwise/` — for element-wise ops (our example)
|
||||||
|
- `csrc/gemm/`, `csrc/attention/`, `csrc/moe/` — for other categories
|
||||||
|
|
||||||
|
Create `sgl-kernel/csrc/elementwise/scale.cu`:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include <ATen/cuda/CUDAContext.h>
|
||||||
|
#include <c10/cuda/CUDAGuard.h>
|
||||||
|
#include <torch/all.h>
|
||||||
|
|
||||||
|
#include "utils.h" // DISPATCH_PYTORCH_DTYPE_TO_CTYPE_FLOAT_FP16
|
||||||
|
|
||||||
|
// scale_kernel: out[i] = input[i] * factor
|
||||||
|
// Supports float, half (__half), __nv_bfloat16 via template T
|
||||||
|
template <typename T>
|
||||||
|
__global__ void scale_kernel(T* __restrict__ out,
|
||||||
|
const T* __restrict__ input,
|
||||||
|
float factor,
|
||||||
|
int64_t n) {
|
||||||
|
int64_t idx = static_cast<int64_t>(blockIdx.x) * blockDim.x + threadIdx.x;
|
||||||
|
if (idx < n) {
|
||||||
|
out[idx] = static_cast<T>(static_cast<float>(input[idx]) * factor);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void scale(at::Tensor& out, const at::Tensor& input, double factor) {
|
||||||
|
TORCH_CHECK(input.is_cuda(), "input must be a CUDA tensor");
|
||||||
|
TORCH_CHECK(input.is_contiguous(), "input must be contiguous");
|
||||||
|
TORCH_CHECK(out.is_cuda(), "out must be a CUDA tensor");
|
||||||
|
TORCH_CHECK(out.is_contiguous(), "out must be contiguous");
|
||||||
|
TORCH_CHECK(out.sizes() == input.sizes(), "out and input must have the same shape");
|
||||||
|
TORCH_CHECK(out.scalar_type() == input.scalar_type(),
|
||||||
|
"out and input must have the same dtype");
|
||||||
|
|
||||||
|
const int64_t n = input.numel();
|
||||||
|
const int threads = 256;
|
||||||
|
const int blocks = (n + threads - 1) / threads;
|
||||||
|
|
||||||
|
const cudaStream_t stream = at::cuda::getCurrentCUDAStream();
|
||||||
|
const at::cuda::OptionalCUDAGuard device_guard(device_of(input));
|
||||||
|
|
||||||
|
// Dispatches over float, float16, bfloat16
|
||||||
|
DISPATCH_PYTORCH_DTYPE_TO_CTYPE_FLOAT_FP16(input.scalar_type(), c_type, [&] {
|
||||||
|
scale_kernel<c_type><<<blocks, threads, 0, stream>>>(
|
||||||
|
static_cast<c_type*>(out.data_ptr()),
|
||||||
|
static_cast<const c_type*>(input.data_ptr()),
|
||||||
|
static_cast<float>(factor),
|
||||||
|
n);
|
||||||
|
cudaError_t status = cudaGetLastError();
|
||||||
|
TORCH_CHECK(status == cudaSuccess,
|
||||||
|
"scale_kernel launch failed: ", cudaGetErrorString(status));
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key points:**
|
||||||
|
|
||||||
|
- Use `at::Tensor` (PyTorch tensors), `TORCH_CHECK` for validation, `at::cuda::getCurrentCUDAStream()` for stream
|
||||||
|
- Keep Python wrappers thin; do shape/dtype/device validation in C++ right around the launch path
|
||||||
|
- `DISPATCH_PYTORCH_DTYPE_TO_CTYPE_FLOAT_FP16` covers `float`, `half` (FP16), `__nv_bfloat16` (BF16)
|
||||||
|
- Add device error checking after every kernel launch
|
||||||
|
- If a kernel only works on certain architectures, enforce that with `TORCH_CHECK` and skip logic in tests
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: Add a C++ declaration in `include/sgl_kernel_ops.h`
|
||||||
|
|
||||||
|
Edit `sgl-kernel/include/sgl_kernel_ops.h`, add to the elementwise section:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
void scale(at::Tensor& out, const at::Tensor& input, double factor);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3: Register the op in `csrc/common_extension.cc`
|
||||||
|
|
||||||
|
Edit `sgl-kernel/csrc/common_extension.cc`, inside `TORCH_LIBRARY_FRAGMENT(sgl_kernel, m)`:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
// From csrc/elementwise
|
||||||
|
m.def("scale(Tensor! out, Tensor input, float factor) -> ()");
|
||||||
|
m.impl("scale", torch::kCUDA, &scale);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key points:**
|
||||||
|
|
||||||
|
- `Tensor!` means in-place / mutable output argument
|
||||||
|
- The schema is important for `torch.compile` and for consistent call signatures
|
||||||
|
- Keep the torch schema in PyTorch scalar types (`float` here), but note that the C++ launcher signature still needs `double` for scalar arguments accepted by `torch::Library`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4: Add the new source file to `CMakeLists.txt`
|
||||||
|
|
||||||
|
Edit `sgl-kernel/CMakeLists.txt`, add to `set(SOURCES ...)`:
|
||||||
|
|
||||||
|
```cmake
|
||||||
|
csrc/elementwise/scale.cu
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key points:**
|
||||||
|
|
||||||
|
- Keep the list **alphabetically sorted** (the file explicitly requires this)
|
||||||
|
- If the kernel has arch constraints, reflect that in tests/benchmarks via skip logic
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5: Expose a Python API under `sgl-kernel/python/sgl_kernel/`
|
||||||
|
|
||||||
|
Prefer following the existing module organization first. For elementwise kernels, the usual pattern is:
|
||||||
|
|
||||||
|
- implement the Python wrapper in `sgl-kernel/python/sgl_kernel/elementwise.py`
|
||||||
|
- then re-export it from `sgl-kernel/python/sgl_kernel/__init__.py`
|
||||||
|
|
||||||
|
For example, in `sgl-kernel/python/sgl_kernel/elementwise.py`, add:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
|
||||||
|
def scale(
|
||||||
|
input: torch.Tensor,
|
||||||
|
factor: float,
|
||||||
|
out: torch.Tensor | None = None,
|
||||||
|
) -> torch.Tensor:
|
||||||
|
"""
|
||||||
|
Element-wise scale: out = input * factor.
|
||||||
|
|
||||||
|
Supported dtypes: torch.float16, torch.bfloat16, torch.float32.
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
input : CUDA input tensor
|
||||||
|
factor : scale factor (float)
|
||||||
|
out : optional pre-allocated CUDA output tensor (same shape/dtype as input)
|
||||||
|
"""
|
||||||
|
if out is None:
|
||||||
|
out = torch.empty_like(input)
|
||||||
|
torch.ops.sgl_kernel.scale.default(out, input, factor)
|
||||||
|
return out
|
||||||
|
```
|
||||||
|
|
||||||
|
Then re-export it from `sgl-kernel/python/sgl_kernel/__init__.py` following the existing import style used by other kernels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 6: Write tests (required)
|
||||||
|
|
||||||
|
Create `sgl-kernel/tests/test_scale.py`:
|
||||||
|
```python
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
import torch
|
||||||
|
import sgl_kernel
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("dtype", [torch.float16, torch.bfloat16, torch.float32])
|
||||||
|
@pytest.mark.parametrize("size", [128, 1024, 4096, 65536])
|
||||||
|
@pytest.mark.parametrize("factor", [0.5, 1.0, 2.0])
|
||||||
|
def test_scale_correctness(dtype, size, factor):
|
||||||
|
input = torch.randn(size, dtype=dtype, device="cuda")
|
||||||
|
out = torch.empty_like(input)
|
||||||
|
|
||||||
|
result = sgl_kernel.scale(input, factor, out=out)
|
||||||
|
assert result is out
|
||||||
|
|
||||||
|
expected = input * factor
|
||||||
|
rtol, atol = (1e-5, 1e-6) if dtype == torch.float32 else (1e-2, 1e-2)
|
||||||
|
torch.testing.assert_close(out, expected, rtol=rtol, atol=atol)
|
||||||
|
|
||||||
|
|
||||||
|
def test_scale_shape_mismatch():
|
||||||
|
input = torch.randn(128, dtype=torch.float16, device="cuda")
|
||||||
|
out = torch.empty(256, dtype=torch.float16, device="cuda")
|
||||||
|
with pytest.raises(RuntimeError, match="same shape"):
|
||||||
|
sgl_kernel.scale(input, 2.0, out=out)
|
||||||
|
|
||||||
|
|
||||||
|
def test_scale_cpu_input():
|
||||||
|
input = torch.randn(128, dtype=torch.float16) # CPU
|
||||||
|
out = torch.empty_like(input)
|
||||||
|
with pytest.raises(RuntimeError, match="CUDA"):
|
||||||
|
sgl_kernel.scale(input, 2.0, out=out)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import sys
|
||||||
|
sys.exit(pytest.main([__file__, "-q"]))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 7: Add a benchmark (required)
|
||||||
|
|
||||||
|
Create `sgl-kernel/benchmark/bench_scale.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import itertools
|
||||||
|
|
||||||
|
import torch
|
||||||
|
import triton
|
||||||
|
import triton.testing
|
||||||
|
|
||||||
|
import sgl_kernel
|
||||||
|
from sglang.utils import is_in_ci
|
||||||
|
|
||||||
|
IS_CI = is_in_ci()
|
||||||
|
|
||||||
|
dtypes = [torch.float16] if IS_CI else [torch.float16, torch.bfloat16, torch.float32]
|
||||||
|
sizes = [4096] if IS_CI else [2**n for n in range(10, 20)] # 1K … 512K
|
||||||
|
factors = [2.0]
|
||||||
|
|
||||||
|
configs = list(itertools.product(dtypes, sizes))
|
||||||
|
|
||||||
|
|
||||||
|
def torch_scale(input: torch.Tensor, factor: float) -> torch.Tensor:
|
||||||
|
return input * factor
|
||||||
|
|
||||||
|
|
||||||
|
@triton.testing.perf_report(
|
||||||
|
triton.testing.Benchmark(
|
||||||
|
x_names=["dtype", "size"],
|
||||||
|
x_vals=configs,
|
||||||
|
line_arg="provider",
|
||||||
|
line_vals=["sglang", "torch"],
|
||||||
|
line_names=["SGL Kernel", "PyTorch"],
|
||||||
|
styles=[("green", "-"), ("red", "--")],
|
||||||
|
ylabel="µs (median)",
|
||||||
|
plot_name="scale-performance",
|
||||||
|
args={},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
def benchmark(dtype, size, provider):
|
||||||
|
input = torch.randn(size, dtype=dtype, device="cuda")
|
||||||
|
out = torch.empty_like(input)
|
||||||
|
factor = 2.0
|
||||||
|
|
||||||
|
if provider == "sglang":
|
||||||
|
fn = lambda: sgl_kernel.scale(input, factor, out=out)
|
||||||
|
else:
|
||||||
|
fn = lambda: torch_scale(input, factor)
|
||||||
|
|
||||||
|
ms, min_ms, max_ms = triton.testing.do_bench_cudagraph(
|
||||||
|
fn, quantiles=[0.5, 0.2, 0.8]
|
||||||
|
)
|
||||||
|
return 1000 * ms, 1000 * max_ms, 1000 * min_ms
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
benchmark.run(print_data=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 8: Build
|
||||||
|
|
||||||
|
Build:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd sgl-kernel
|
||||||
|
make build -j16
|
||||||
|
```
|
||||||
|
|
||||||
|
If you need to limit host resource usage:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd sgl-kernel
|
||||||
|
make build -j1 MAX_JOBS=2 CMAKE_ARGS="-DSGL_KERNEL_COMPILE_THREADS=1"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 9: Validate
|
||||||
|
|
||||||
|
After building successfully, run the test and benchmark:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest sgl-kernel/tests/test_scale.py -q
|
||||||
|
python sgl-kernel/benchmark/bench_scale.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
- **Async CUDA errors**: `CUDA_LAUNCH_BLOCKING=1`
|
||||||
|
- **Memory errors**: `compute-sanitizer --tool memcheck python ...`
|
||||||
|
- **Build is too slow / OOM**: reduce `MAX_JOBS` and `SGL_KERNEL_COMPILE_THREADS`
|
||||||
|
- **Binary bloat**: use `sgl-kernel/analyze_whl_kernel_sizes.py`
|
||||||
|
- **CMake sources list**: if your `.cu` file is missing from `SOURCES`, the symbol will be undefined at link time
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `sgl-kernel/README.md`
|
||||||
|
- `sgl-kernel/include/sgl_kernel_ops.h`
|
||||||
|
- `sgl-kernel/csrc/common_extension.cc`
|
||||||
|
- `sgl-kernel/CMakeLists.txt`
|
||||||
|
- `sgl-kernel/include/utils.h` — `DISPATCH_PYTORCH_DTYPE_TO_CTYPE_FLOAT_FP16` macro and friends
|
||||||
|
- `sgl-kernel/csrc/elementwise/activation.cu` — reference for the FP16/BF16/FP32 dispatch pattern
|
||||||
|
|
||||||
|
## Summary of Files Created/Modified
|
||||||
|
|
||||||
|
```
|
||||||
|
sgl-kernel/csrc/elementwise/scale.cu # NEW: CUDA kernel + launcher
|
||||||
|
sgl-kernel/include/sgl_kernel_ops.h # MODIFIED: C++ declaration
|
||||||
|
sgl-kernel/csrc/common_extension.cc # MODIFIED: schema + dispatch registration
|
||||||
|
sgl-kernel/CMakeLists.txt # MODIFIED: add source file (alphabetical)
|
||||||
|
sgl-kernel/python/sgl_kernel/elementwise.py # MODIFIED: Python wrapper
|
||||||
|
sgl-kernel/python/sgl_kernel/__init__.py # MODIFIED: re-export Python API
|
||||||
|
sgl-kernel/tests/test_scale.py # NEW: tests
|
||||||
|
sgl-kernel/benchmark/bench_scale.py # NEW: benchmark
|
||||||
|
```
|
||||||
386
third_party/sglang/.claude/skills/ci-workflow-guide/SKILL.md
vendored
Normal file
386
third_party/sglang/.claude/skills/ci-workflow-guide/SKILL.md
vendored
Normal file
@@ -0,0 +1,386 @@
|
|||||||
|
---
|
||||||
|
name: ci-workflow-guide
|
||||||
|
description: Guide to SGLang CI workflow orchestration — stage ordering, fast-fail, gating, partitioning, execution modes, and debugging CI failures. Use when modifying CI workflows, adding stages, debugging CI pipeline issues, or understanding how tests are dispatched and gated across stages.
|
||||||
|
---
|
||||||
|
|
||||||
|
# SGLang CI Workflow Orchestration Guide
|
||||||
|
|
||||||
|
This skill covers the CI **infrastructure** layer — how tests are dispatched, gated, and fast-failed across stages. For test authoring (templates, fixtures, registration, model selection), see the [write-sglang-test skill](../write-sglang-test/SKILL.md).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Naming Conventions
|
||||||
|
|
||||||
|
- **Suite**: `stage-{a,b,c}-test-{gpu_count}-gpu-{hardware}` (e.g., `stage-b-test-1-gpu-small`)
|
||||||
|
- **CI runner**: `{gpu_count}-gpu-{hardware}` (e.g., `1-gpu-5090`, `4-gpu-h100`, `8-gpu-h200`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Files
|
||||||
|
|
||||||
|
| File | Role |
|
||||||
|
|------|------|
|
||||||
|
| `.github/workflows/pr-test.yml` | Main workflow — all stages, jobs, conditions, matrix definitions |
|
||||||
|
| `.github/workflows/pr-gate.yml` | PR gating: draft check, `run-ci` label, per-user rate limiting |
|
||||||
|
| `.github/actions/check-stage-health/action.yml` | Cross-job fast-fail: queries API for any failed job |
|
||||||
|
| `.github/actions/wait-for-jobs/action.yml` | Stage gating: polls API until stage jobs complete |
|
||||||
|
| `.github/actions/check-maintenance/action.yml` | Maintenance mode check |
|
||||||
|
| `test/run_suite.py` | Suite runner: collects, filters, partitions, executes tests |
|
||||||
|
| `python/sglang/test/ci/ci_register.py` | Test registration (AST-parsed markers), LPT auto-partition |
|
||||||
|
| `python/sglang/test/ci/ci_utils.py` | `run_unittest_files()`: execution, retry, continue-on-error |
|
||||||
|
| `scripts/ci/utils/slash_command_handler.py` | Handles slash commands from PR comments |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────┐
|
||||||
|
│ build kernel │
|
||||||
|
└──────┬───────┘
|
||||||
|
│
|
||||||
|
├─ check-changes ──── detects which packages changed
|
||||||
|
│ (main_package, sgl_kernel, jit_kernel, multimodal_gen)
|
||||||
|
│
|
||||||
|
├─ call-gate ──────── pr-gate.yml (draft? label? rate limit?)
|
||||||
|
│
|
||||||
|
├─────────────────────────────────────────────────────┐
|
||||||
|
│ │
|
||||||
|
▼ │
|
||||||
|
┌─────────────────────────────────────┐ │
|
||||||
|
│ Stage A (~3 min) │ │
|
||||||
|
│ pre-flight check │ │
|
||||||
|
│ │ │
|
||||||
|
│ ┌─────────────────────────────┐ │ │
|
||||||
|
│ │ stage-a-test-1-gpu-small │ │ │
|
||||||
|
│ │ (small GPUs) │ │ │
|
||||||
|
│ └─────────────────────────────┘ │ │
|
||||||
|
│ ┌─────────────────────────────┐ │ │
|
||||||
|
│ │ stage-a-test-cpu │ │ │
|
||||||
|
│ │ (CPU) │ │ │
|
||||||
|
│ └─────────────────────────────┘ │ │
|
||||||
|
└──────┬──────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌─────────────────────────────────────┐ ┌──────────────────────────┐
|
||||||
|
│ Stage B (~30 min) │ │ kernel test │
|
||||||
|
│ basic tests │ └──────────────────────────┘
|
||||||
|
│ │ ┌──────────────────────────┐
|
||||||
|
│ ┌─────────────────────────────┐ │ │ multimodal gen test │
|
||||||
|
│ │ stage-b-test-1-gpu-small │ │ └──────────────────────────┘
|
||||||
|
│ │ (small GPUs, e.g. 5090) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
│ ┌─────────────────────────────┐ │
|
||||||
|
│ │ stage-b-test-1-gpu-large │ │
|
||||||
|
│ │ (large GPUs, e.g. H100) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
│ ┌─────────────────────────────┐ │
|
||||||
|
│ │ stage-b-test-2-gpu-large │ │
|
||||||
|
│ │ (large GPUs, e.g. H100) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
└──────┬──────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ Stage C (~30 min) │
|
||||||
|
│ advanced tests │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────┐ │
|
||||||
|
│ │ stage-c-test-4-gpu-h100 │ │
|
||||||
|
│ │ (H100 GPUs) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
│ ┌─────────────────────────────┐ │
|
||||||
|
│ │ stage-c-test-8-gpu-h200 │ │
|
||||||
|
│ │ (8 x H200 GPUs) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
│ ┌─────────────────────────────┐ │
|
||||||
|
│ │ stage-c-test-4-gpu-b200 │ │
|
||||||
|
│ │ (4 x B200 GPUs) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
│ ┌─────────────────────────────┐ │
|
||||||
|
│ │ Other advanced tests │ │
|
||||||
|
│ │ (DeepEP, PD Disagg, GB300) │ │
|
||||||
|
│ └─────────────────────────────┘ │
|
||||||
|
└──────┬──────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ pr-test-finish │
|
||||||
|
│ aggregates all results, fails if │
|
||||||
|
│ any job failed/cancelled │
|
||||||
|
└─────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Every stage test job** includes a `check-stage-health` step after checkout — if any job in the run has already failed, the job fast-fails (red X) with a root cause annotation.
|
||||||
|
|
||||||
|
**Scheduled runs** skip `wait-for-stage-*` jobs, running all stages in parallel. Fast-fail is also disabled.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fast-Fail Layers
|
||||||
|
|
||||||
|
4 layers of fast-fail, from fine to coarse:
|
||||||
|
|
||||||
|
| Layer | Mechanism | Granularity | Disabled on schedule? |
|
||||||
|
|-------|-----------|-------------|----------------------|
|
||||||
|
| **1. Test method → file** | `unittest -f` (failfast) | One test method fails → entire test file stops immediately | Yes |
|
||||||
|
| **2. File → suite** | `run_unittest_files()` default | One test file fails → entire suite stops (`--continue-on-error` off) | Yes |
|
||||||
|
| **3. Job → job (same stage)** | `check-stage-health` action | One job fails → other waiting jobs in same stage fast-fail (red X) | Yes |
|
||||||
|
| **4. Stage → stage (cross-stage)** | `wait-for-stage` + `needs` | Stage A fails → stage B/C jobs skip entirely (never get a runner) | Yes (wait jobs skipped) |
|
||||||
|
|
||||||
|
- **Layer 1**: `-f` flag appended to all `python3 -m pytest` / `unittest` invocations in `ci_utils.py`
|
||||||
|
- **Layer 2**: `--continue-on-error` flag in `run_suite.py` — off for PRs, on for scheduled runs
|
||||||
|
- **Layer 3**: `check-stage-health` auto-detects `schedule` event and skips; filters out cascade failures to show only root cause jobs
|
||||||
|
- **Layer 4**: `wait-for-stage-*` jobs are conditioned on `github.event_name == 'pull_request'` — skipped for scheduled runs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution Modes
|
||||||
|
|
||||||
|
| Aspect | PR (`pull_request`) | Scheduled (`cron`, every 6h) | `/rerun-stage` (`workflow_dispatch`) |
|
||||||
|
|--------|---------------------|------------------------------|--------------------------------------|
|
||||||
|
| **Stage ordering** | Sequential: A → B → C via `wait-for-stage-*` | Parallel (all at once) | Single target stage only |
|
||||||
|
| **Cross-job fast-fail** | Yes (`check-stage-health`) | Yes | Yes |
|
||||||
|
| **continue-on-error** | No (stop at first failure within suite) | Yes (run all tests) | No |
|
||||||
|
| **Retry** | Enabled | Enabled | Enabled |
|
||||||
|
| **max_parallel** | 3 (default), 14 if `high priority` label | 14 | 3 (default), 14 if `high priority` |
|
||||||
|
| **PR gate** | Yes (draft, label, rate limit) | Skipped | Skipped |
|
||||||
|
| **Concurrency** | `cancel-in-progress: true` per branch | Queue (no cancel) | Isolated per stage+SHA |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage Gating (`wait-for-jobs` action)
|
||||||
|
|
||||||
|
`wait-for-stage-a` and `wait-for-stage-b` are lightweight `ubuntu-latest` jobs that poll the GitHub Actions API.
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
1. Calls `listJobsForWorkflowRun` to list all jobs in the current run
|
||||||
|
2. Matches jobs by exact name or prefix (for matrix jobs, e.g., `stage-b-test-1-gpu-small (3)`)
|
||||||
|
3. If any matched job has `conclusion === 'failure'` → fail immediately (fast-fail)
|
||||||
|
4. If all matched jobs are completed and count matches `expected_count` → success
|
||||||
|
5. Otherwise → sleep `poll-interval-seconds` (default: 60s) and retry
|
||||||
|
6. Timeout after `max-wait-minutes` (240 min for stage-a, 480 min for stage-b)
|
||||||
|
|
||||||
|
**Job specs example** (stage-b):
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{"prefix": "stage-b-test-1-gpu-small", "expected_count": 8},
|
||||||
|
{"prefix": "stage-b-test-1-gpu-large", "expected_count": 14},
|
||||||
|
{"prefix": "stage-b-test-2-gpu-large", "expected_count": 4},
|
||||||
|
{"prefix": "stage-b-test-4-gpu-b200", "expected_count": 1}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Critical**: `expected_count` must match the matrix size. If you add/remove matrix entries, update the wait job's spec accordingly.
|
||||||
|
|
||||||
|
**PR only**: Condition `github.event_name == 'pull_request' && !inputs.target_stage` — scheduled runs and `/rerun-stage` skip these entirely, allowing parallel execution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cross-Job Fast-Fail (`check-stage-health` action)
|
||||||
|
|
||||||
|
Composite action called after checkout in every stage test job (21 jobs total across `pr-test.yml`, `pr-test-multimodal-gen.yml`, `pr-test-sgl-kernel.yml`, `pr-test-jit-kernel.yml`).
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
1. Queries `listJobsForWorkflowRun` for the current workflow run
|
||||||
|
2. Filters for **root cause failures only** — jobs with `conclusion === 'failure'` whose failing step is NOT `check-stage-health` (excludes cascade failures)
|
||||||
|
3. If root cause failures found → calls `core.setFailed()` with the list of root cause job names
|
||||||
|
4. If none → does nothing (step succeeds)
|
||||||
|
|
||||||
|
**Cascade filtering**: When job A fast-fails due to health check, it also has `conclusion: failure`. Without filtering, job B would list both the original failure AND job A's fast-fail. The filter checks each failed job's `steps` array — if the failing step name contains `check-stage-health` or `Check stage health`, it's excluded from the root cause list.
|
||||||
|
|
||||||
|
**Usage pattern:**
|
||||||
|
```yaml
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
...
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
id: stage-health
|
||||||
|
|
||||||
|
- name: Install dependencies # skipped automatically if health check failed
|
||||||
|
... # (default if: success() is false)
|
||||||
|
|
||||||
|
- name: Run test # also skipped
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Visual effect**: Job shows **red X** (failure) with error annotation showing root cause job names. Subsequent steps are naturally skipped (default `if: success()` is false after a failed step). No per-step `if` guards needed.
|
||||||
|
|
||||||
|
**No stage filtering**: Checks ALL jobs in the run, not just the current stage. Any failure anywhere triggers fast-fail.
|
||||||
|
|
||||||
|
**Error message example:**
|
||||||
|
```
|
||||||
|
Fast-fail: skipping — root cause job(s): stage-b-test-1-gpu-small (0), stage-b-test-1-gpu-small (1)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Within-Suite Failure Handling
|
||||||
|
|
||||||
|
Controlled by `run_unittest_files()` in `python/sglang/test/ci/ci_utils.py`.
|
||||||
|
|
||||||
|
### Flags
|
||||||
|
|
||||||
|
| Flag | PR default | Scheduled default | Effect |
|
||||||
|
|------|------------|-------------------|--------|
|
||||||
|
| `--continue-on-error` | Off | On | Off: stop at first failure. On: run all files, report all failures at end |
|
||||||
|
| `--enable-retry` | On | On | Retry retriable failures (accuracy/perf assertions) |
|
||||||
|
| `--max-attempts` | 2 | 2 | Max attempts per file including initial run |
|
||||||
|
|
||||||
|
### Retry Classification
|
||||||
|
|
||||||
|
When a test fails and retry is enabled, the output is classified:
|
||||||
|
|
||||||
|
**Non-retriable** (checked first — real code errors):
|
||||||
|
`SyntaxError`, `ImportError`, `ModuleNotFoundError`, `NameError`, `TypeError`, `AttributeError`, `RuntimeError`, `CUDA out of memory`, `OOM`, `Segmentation fault`, `core dumped`, `ConnectionRefusedError`, `FileNotFoundError`
|
||||||
|
|
||||||
|
**Retriable** (accuracy/performance):
|
||||||
|
`AssertionError` with comparison patterns (`not greater than`, `not less than`, `not equal to`), `accuracy`, `score`, `latency`, `throughput`, `timeout`
|
||||||
|
|
||||||
|
**Default**: Unknown `AssertionError` → retriable. Other unknown failures → not retriable.
|
||||||
|
|
||||||
|
### How `continue_on_error` is set
|
||||||
|
|
||||||
|
In `pr-test.yml`'s `check-changes` job:
|
||||||
|
- `schedule` runs or `run_all_tests` flag → `continue_on_error = 'true'`
|
||||||
|
- PR runs → `continue_on_error = 'false'`
|
||||||
|
|
||||||
|
Each test job propagates via:
|
||||||
|
```yaml
|
||||||
|
env:
|
||||||
|
CONTINUE_ON_ERROR_FLAG: ${{ needs.check-changes.outputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
|
||||||
|
run: |
|
||||||
|
python3 run_suite.py --hw cuda --suite <name> $CONTINUE_ON_ERROR_FLAG
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Partitioning
|
||||||
|
|
||||||
|
Large suites are split across matrix jobs using the **LPT (Longest Processing Time) heuristic** in `ci_register.py:auto_partition()`:
|
||||||
|
|
||||||
|
1. Sort tests by `est_time` descending, filename as tie-breaker (deterministic)
|
||||||
|
2. Greedily assign each test to the partition with smallest cumulative time
|
||||||
|
3. Result: roughly equal total time per partition
|
||||||
|
|
||||||
|
**Partition table** (CUDA per-commit suites):
|
||||||
|
|
||||||
|
| Suite | Partitions | Runner | max_parallel |
|
||||||
|
|-------|-----------|--------|-------------|
|
||||||
|
| `stage-a-test-1-gpu-small` | 1 (no matrix) | `1-gpu-5090` | — |
|
||||||
|
| `stage-a-test-cpu` | 1 (no matrix) | `ubuntu-latest` | — |
|
||||||
|
| `stage-b-test-1-gpu-small` | 8 | `1-gpu-5090` | 8 |
|
||||||
|
| `stage-b-test-1-gpu-large` | 14 | `1-gpu-h100` | dynamic (3 or 14) |
|
||||||
|
| `stage-b-test-2-gpu-large` | 4 | `2-gpu-h100` | — |
|
||||||
|
| `stage-b-test-4-gpu-b200` | 1 (no matrix) | `4-gpu-b200` | — |
|
||||||
|
| `stage-b-kernel-unit-1-gpu-large` | 1 (no matrix) | `1-gpu-h100` | — |
|
||||||
|
| `stage-b-kernel-unit-8-gpu-h200` | 1 (no matrix) | `8-gpu-h200` | — |
|
||||||
|
| `stage-b-kernel-benchmark-1-gpu-large` | 1 (no matrix) | `1-gpu-h100` | — |
|
||||||
|
| `stage-c-test-4-gpu-h100` | 3 | `4-gpu-h100` | — |
|
||||||
|
| `stage-c-test-8-gpu-h200` | 4 | `8-gpu-h200` | — |
|
||||||
|
| `stage-c-test-8-gpu-h20` | 2 | `8-gpu-h20` | — |
|
||||||
|
| `stage-c-test-deepep-4-gpu-h100` | 1 (no matrix) | `4-gpu-h100` | — |
|
||||||
|
| `stage-c-test-deepep-8-gpu-h200` | 1 (no matrix) | `8-gpu-h200` | — |
|
||||||
|
| `stage-c-test-4-gpu-b200` | 4 | `4-gpu-b200` | — |
|
||||||
|
| `stage-c-test-4-gpu-gb200` | 1 (no matrix) | `4-gpu-gb200` | — |
|
||||||
|
|
||||||
|
> **Note**: Kernel suites (`stage-b-kernel-*`) run via `pr-test-jit-kernel.yml` and `pr-test-sgl-kernel.yml`, not the main `pr-test.yml`. Multimodal diffusion uses `python/sglang/multimodal_gen/test/run_suite.py`, not `test/run_suite.py`.
|
||||||
|
|
||||||
|
**Workflow usage:**
|
||||||
|
```yaml
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
partition: [0, 1, 2, 3, 4, 5, 6, 7]
|
||||||
|
steps:
|
||||||
|
- run: python3 run_suite.py --hw cuda --suite stage-b-test-1-gpu-small \
|
||||||
|
--auto-partition-id ${{ matrix.partition }} --auto-partition-size 8
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## check-changes Job
|
||||||
|
|
||||||
|
Determines which test suites to run based on file changes.
|
||||||
|
|
||||||
|
### Detection Methods
|
||||||
|
|
||||||
|
| Trigger | Method | Details |
|
||||||
|
|---------|--------|---------|
|
||||||
|
| `pull_request` | `dorny/paths-filter` | Detects changes via GitHub diff |
|
||||||
|
| `workflow_dispatch` (with `pr_head_sha`) | GitHub API | `repos/{repo}/compare/main...{sha}` |
|
||||||
|
| `schedule` / `run_all_tests` | Force all true | Runs everything |
|
||||||
|
|
||||||
|
### Output Flags
|
||||||
|
|
||||||
|
| Output | Triggers |
|
||||||
|
|--------|----------|
|
||||||
|
| `main_package` | Stage A/B/C test suites |
|
||||||
|
| `sgl_kernel` | Kernel wheel builds + kernel test suites |
|
||||||
|
| `jit_kernel` | JIT kernel test workflow |
|
||||||
|
| `multimodal_gen` | Multimodal-gen test workflow |
|
||||||
|
|
||||||
|
> **Note**: `sgl_kernel` is forced to `false` when `target_stage` is set, because `sgl-kernel-build-wheels` won't run and wheel artifacts won't be available.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Concurrency Control
|
||||||
|
|
||||||
|
```
|
||||||
|
group: pr-test-{event_name}-{branch}-{pr_sha}-{stage}
|
||||||
|
```
|
||||||
|
|
||||||
|
| Segment | Source | Purpose |
|
||||||
|
|---------|--------|---------|
|
||||||
|
| `event_name` | `github.event_name` | Prevents scheduled runs colliding with fork PRs named `main` |
|
||||||
|
| `branch` | `github.head_ref \|\| github.ref_name` | Per-branch isolation |
|
||||||
|
| `pr_sha` | `inputs.pr_head_sha \|\| 'current'` | Isolates `/rerun-stage` from main runs |
|
||||||
|
| `stage` | `inputs.target_stage \|\| 'all'` | Allows parallel stage dispatches |
|
||||||
|
|
||||||
|
`cancel-in-progress: true` for `pull_request` events (new push cancels old run), `false` for `workflow_call`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How To: Add a New Stage Job
|
||||||
|
|
||||||
|
1. Define the job in `pr-test.yml` with `needs: [check-changes, call-gate, wait-for-stage-X, ...]`
|
||||||
|
2. Copy the `if:` condition pattern from an existing same-stage job (handles `target_stage`, `schedule`, `main_package`)
|
||||||
|
3. Add `checkout` step
|
||||||
|
4. Add `check-stage-health` step (after checkout) — if any prior job failed, `core.setFailed()` fires and all subsequent steps auto-skip via default `if: success()`
|
||||||
|
5. Add `check-maintenance` step
|
||||||
|
6. Add `download-artifact` step if `sgl_kernel` changed
|
||||||
|
7. Add `install dependencies` step
|
||||||
|
8. Add `run test` step with `$CONTINUE_ON_ERROR_FLAG`
|
||||||
|
9. Add `upload-cuda-coredumps` step with `if: always()`
|
||||||
|
10. Register the suite name in `PER_COMMIT_SUITES` in `test/run_suite.py`
|
||||||
|
11. If using matrix, add `--auto-partition-id` and `--auto-partition-size` to the run command
|
||||||
|
12. **Update `wait-for-stage-X`** job spec with the new job name and `expected_count` (if matrix)
|
||||||
|
13. **Add the job to `pr-test-finish.needs`** list
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How To: Debug CI Failures
|
||||||
|
|
||||||
|
| Symptom | Likely cause | What to check |
|
||||||
|
|---------|-------------|---------------|
|
||||||
|
| All stage-B/C jobs green but steps skipped | Earlier job failed, `check-stage-health` triggered | Find the actual failed job (red X) |
|
||||||
|
| `wait-for-stage-b` timeout | `expected_count` doesn't match matrix size | Verify job spec counts match `matrix:` array length |
|
||||||
|
| `pr-test-finish` fails but all jobs green | A job was `cancelled` (counts as failure in finish) | Check concurrency cancellation |
|
||||||
|
| Tests pass locally but fail in CI | Partition assignment, runner GPU type, or `est_time` inaccuracy | Check which partition the test lands in; verify runner label |
|
||||||
|
| Flaky test retried and passed | Retriable failure (accuracy/perf) | Check `[CI Retry]` markers in job logs |
|
||||||
|
| Flaky test NOT retried | Matched non-retriable pattern | Check if error matches `NON_RETRIABLE_PATTERNS` in `ci_utils.py` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Slash Commands
|
||||||
|
|
||||||
|
| Command | Effect |
|
||||||
|
|---------|--------|
|
||||||
|
| `/tag-run-ci-label` | Adds `run-ci` label to PR |
|
||||||
|
| `/rerun-failed-ci` | Reruns failed jobs in the latest workflow run |
|
||||||
|
| `/tag-and-rerun-ci` | Adds label + reruns |
|
||||||
|
| `/rerun-stage <stage>` | Dispatches `pr-test.yml` with `target_stage=<stage>` |
|
||||||
|
| `/rerun-test <test-file>` | Reruns a specific test file via `rerun-test.yml` |
|
||||||
|
|
||||||
|
Handled by `scripts/ci/utils/slash_command_handler.py` → `.github/workflows/slash-command-handler.yml`.
|
||||||
657
third_party/sglang/.claude/skills/debug-cuda-crash/SKILL.md
vendored
Normal file
657
third_party/sglang/.claude/skills/debug-cuda-crash/SKILL.md
vendored
Normal file
@@ -0,0 +1,657 @@
|
|||||||
|
---
|
||||||
|
name: debug-cuda-crash
|
||||||
|
description: Call this skill when you need to debug CUDA crashes in SGLang using kernel API logging
|
||||||
|
---
|
||||||
|
|
||||||
|
# Tutorial: Debugging CUDA Crashes with Kernel API Logging
|
||||||
|
|
||||||
|
This tutorial shows you how to debug CUDA crashes and errors in SGLang using the `@debug_kernel_api` logging decorator.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
When your code crashes with CUDA errors such as illegal memory access, device-side assert, out-of-bounds, or NaN/Inf, use kernel API logging to:
|
||||||
|
- Capture input tensors BEFORE the crash occurs
|
||||||
|
- Understand what data caused the problem
|
||||||
|
- Track tensor shapes, dtypes, and values through the call boundary that triggered the crash
|
||||||
|
- Detect numerical issues such as NaN, Inf, or obviously wrong shapes
|
||||||
|
|
||||||
|
## Why Use Kernel API Logging?
|
||||||
|
|
||||||
|
**Problem**: CUDA errors often crash the program before normal debugging output is flushed.
|
||||||
|
|
||||||
|
**Solution**: SGLang's `@debug_kernel_api` decorator logs inputs before execution, so you can still see what caused the crash even after the program aborts.
|
||||||
|
|
||||||
|
## What Is Covered?
|
||||||
|
|
||||||
|
The current logging coverage focuses on the highest-value kernel boundaries in SGLang:
|
||||||
|
- Custom ops registered through `register_custom_op(...)`
|
||||||
|
- External custom ops registered through `register_custom_op_from_extern(...)`
|
||||||
|
- LLM attention, linear, quantization, and multi-platform wrapper entry points
|
||||||
|
- Diffusion attention impl, linear, rotary, and custom-op wrapper entry points
|
||||||
|
- Selected direct `torch.ops.sglang.*` hotspots and model-specific bypasses
|
||||||
|
|
||||||
|
This means the logging is useful for both LLM and diffusion kernel debugging, but it does not automatically cover every pure PyTorch call in the repository.
|
||||||
|
|
||||||
|
## Step 1: Enable Kernel API Logging
|
||||||
|
|
||||||
|
### Basic Logging (Function Names Only)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=1
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=stdout
|
||||||
|
|
||||||
|
python my_script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Output:
|
||||||
|
```
|
||||||
|
================================================================================
|
||||||
|
[2026-03-19 00:47:06] SGLang Kernel API Call: RMSNorm.forward
|
||||||
|
================================================================================
|
||||||
|
[2026-03-19 00:47:06] SGLang Kernel API Call: sglang.quant_method.UnquantizedLinearMethod.apply
|
||||||
|
================================================================================
|
||||||
|
[2026-03-19 00:47:06] SGLang Kernel API Call: sglang.custom_op.fused_inplace_qknorm
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a real level-1 excerpt captured from `Qwen/Qwen3-0.6B`.
|
||||||
|
|
||||||
|
### Detailed Logging (Inputs with Metadata)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug.log
|
||||||
|
|
||||||
|
python my_script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Output in `debug.log`:
|
||||||
|
```
|
||||||
|
================================================================================
|
||||||
|
[2026-03-19 00:47:30] SGLang Kernel API Call: sglang.quant_method.UnquantizedLinearMethod.apply
|
||||||
|
Positional input arguments:
|
||||||
|
arg[0]=QKVParallelLinear(
|
||||||
|
repr=QKVParallelLinear(in_features=1024, output_features=4096, bias=False, tp_size=1, gather_output=False)
|
||||||
|
)
|
||||||
|
arg[1]=Tensor(
|
||||||
|
shape=(1, 1024)
|
||||||
|
dtype=torch.bfloat16
|
||||||
|
device=cuda:0
|
||||||
|
requires_grad=False
|
||||||
|
is_contiguous=True
|
||||||
|
)
|
||||||
|
arg[2]=None
|
||||||
|
Output:
|
||||||
|
return=Tensor(
|
||||||
|
shape=(1, 4096)
|
||||||
|
dtype=torch.bfloat16
|
||||||
|
device=cuda:0
|
||||||
|
requires_grad=False
|
||||||
|
is_contiguous=True
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a real level-3 excerpt captured from `Qwen/Qwen3-0.6B`.
|
||||||
|
|
||||||
|
### Full Logging (With Tensor Statistics)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=5
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug.log
|
||||||
|
|
||||||
|
python my_script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Additional output:
|
||||||
|
```
|
||||||
|
================================================================================
|
||||||
|
[2026-03-19 01:00:42] SGLang Kernel API Call: diffusion.quant_method.UnquantizedLinearMethod.apply
|
||||||
|
Positional input arguments:
|
||||||
|
arg[1]=Tensor(
|
||||||
|
shape=(1, 77, 768)
|
||||||
|
dtype=torch.bfloat16
|
||||||
|
device=cuda:0
|
||||||
|
requires_grad=False
|
||||||
|
is_contiguous=True
|
||||||
|
min=-27.250000
|
||||||
|
max=28.500000
|
||||||
|
mean=0.011723
|
||||||
|
nan_count=0
|
||||||
|
inf_count=0
|
||||||
|
)
|
||||||
|
Output:
|
||||||
|
return=Tensor(
|
||||||
|
shape=(1, 77, 2304)
|
||||||
|
dtype=torch.bfloat16
|
||||||
|
device=cuda:0
|
||||||
|
requires_grad=False
|
||||||
|
is_contiguous=True
|
||||||
|
min=-8.937500
|
||||||
|
max=9.375000
|
||||||
|
mean=0.009460
|
||||||
|
nan_count=0
|
||||||
|
inf_count=0
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a real level-5 excerpt captured from `black-forest-labs/FLUX.1-dev`.
|
||||||
|
|
||||||
|
### Crash-Safe Dumps (Inputs Saved Before Execution)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=10
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug.log
|
||||||
|
export SGLANG_KERNEL_API_DUMP_DIR=/tmp/sglang_kernel_api_dumps
|
||||||
|
|
||||||
|
python my_script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
At level 10, SGLang saves the inputs before execution. If the kernel crashes, the dump directory still contains the inputs and exception metadata.
|
||||||
|
|
||||||
|
If CUDA graph capture is active, tensor dumps are skipped automatically to avoid capture-time CUDA errors. In that case, you still get the kernel API call log, but not `inputs.pt` / `outputs.pt`.
|
||||||
|
|
||||||
|
Level-10 dumps are best understood as crash-safe call snapshots. They always preserve the observed call boundary. They do not guarantee one-click replay for every method, because some methods depend on module state that is not serialized into the dump.
|
||||||
|
|
||||||
|
Real level-10 dump layout from `Qwen/Qwen3-0.6B`:
|
||||||
|
|
||||||
|
```text
|
||||||
|
/tmp/sglang_kernel_api_validation/qwen_qwen3_0_6b_level10_dumps
|
||||||
|
/tmp/sglang_kernel_api_validation/qwen_qwen3_0_6b_level10_dumps/20260319_004821_182_pid919286_RotaryEmbedding.forward_call0001
|
||||||
|
/tmp/sglang_kernel_api_validation/qwen_qwen3_0_6b_level10_dumps/20260319_004821_182_pid919286_RotaryEmbedding.forward_call0001/inputs.pt
|
||||||
|
/tmp/sglang_kernel_api_validation/qwen_qwen3_0_6b_level10_dumps/20260319_004821_182_pid919286_RotaryEmbedding.forward_call0001/metadata.json
|
||||||
|
/tmp/sglang_kernel_api_validation/qwen_qwen3_0_6b_level10_dumps/20260319_004821_182_pid919286_RotaryEmbedding.forward_call0001/outputs.pt
|
||||||
|
```
|
||||||
|
|
||||||
|
Real `metadata.json` excerpt:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"function_name": "RotaryEmbedding.forward",
|
||||||
|
"timestamp": "20260319_004821_182",
|
||||||
|
"process_id": 919286,
|
||||||
|
"execution_status": "completed",
|
||||||
|
"input_tensor_keys": ["arg_0", "arg_1", "arg_2"],
|
||||||
|
"output_tensor_keys": ["result_0", "result_1"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 2: Reproduce an LLM CUDA Crash
|
||||||
|
|
||||||
|
Create a temporary reproducer:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 - <<'PY'
|
||||||
|
from pathlib import Path
|
||||||
|
Path("/tmp/sglang_llm_crash.py").write_text(
|
||||||
|
"import torch\\n"
|
||||||
|
"import torch.nn.functional as F\\n"
|
||||||
|
"from sglang.srt.utils.custom_op import register_custom_op\\n\\n"
|
||||||
|
"def _fake_embedding(indices, table):\\n"
|
||||||
|
" return torch.empty((*indices.shape, table.shape[-1]), device=table.device, dtype=table.dtype)\\n\\n"
|
||||||
|
"@register_custom_op(op_name='mock_llm_cuda_crash', fake_impl=_fake_embedding)\\n"
|
||||||
|
"def mock_llm_cuda_crash(indices, table):\\n"
|
||||||
|
" out = F.embedding(indices, table)\\n"
|
||||||
|
" torch.cuda.synchronize()\\n"
|
||||||
|
" return out\\n\\n"
|
||||||
|
"table = torch.randn(4, 8, device='cuda', dtype=torch.float16)\\n"
|
||||||
|
"indices = torch.tensor([0, 7], device='cuda', dtype=torch.long)\\n"
|
||||||
|
"mock_llm_cuda_crash(indices, table)\\n"
|
||||||
|
)
|
||||||
|
PY
|
||||||
|
|
||||||
|
SGLANG_KERNEL_API_LOGLEVEL=1 \
|
||||||
|
SGLANG_KERNEL_API_LOGDEST=/tmp/sglang_llm_level1.log \
|
||||||
|
python3 /tmp/sglang_llm_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
What to expect:
|
||||||
|
- The script exits with a CUDA `device-side assert`
|
||||||
|
- The log still contains the last API boundary before the crash
|
||||||
|
|
||||||
|
Try the same example at level 3:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
SGLANG_KERNEL_API_LOGLEVEL=3 \
|
||||||
|
SGLANG_KERNEL_API_LOGDEST=/tmp/sglang_llm_level3.log \
|
||||||
|
python3 /tmp/sglang_llm_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Now the log shows tensor metadata before the crash.
|
||||||
|
|
||||||
|
Try level 10:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
SGLANG_KERNEL_API_LOGLEVEL=10 \
|
||||||
|
SGLANG_KERNEL_API_LOGDEST=/tmp/sglang_llm_level10.log \
|
||||||
|
SGLANG_KERNEL_API_DUMP_DIR=/tmp/sglang_llm_level10_dumps \
|
||||||
|
python3 /tmp/sglang_llm_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Now you should see:
|
||||||
|
- A log entry for `sglang.custom_op.mock_llm_cuda_crash`
|
||||||
|
- A dump directory with `inputs.pt`
|
||||||
|
- `metadata.json` showing `execution_status: "exception"`
|
||||||
|
- No `outputs.pt`, because the kernel crashed before producing output
|
||||||
|
|
||||||
|
For real-model success-path level-10 dumps, it is often easier to temporarily disable CUDA graph and piecewise CUDA graph for the debug run.
|
||||||
|
|
||||||
|
## Step 3: Reproduce a Diffusion CUDA Crash
|
||||||
|
|
||||||
|
Create a temporary diffusion-side reproducer:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 - <<'PY'
|
||||||
|
from pathlib import Path
|
||||||
|
Path("/tmp/sglang_diffusion_crash.py").write_text(
|
||||||
|
"import torch\\n"
|
||||||
|
"import torch.nn.functional as F\\n"
|
||||||
|
"from sglang.multimodal_gen.runtime.layers.utils import register_custom_op\\n\\n"
|
||||||
|
"def _fake_embedding(positions, cache):\\n"
|
||||||
|
" return torch.empty((*positions.shape, cache.shape[-1]), device=cache.device, dtype=cache.dtype)\\n\\n"
|
||||||
|
"@register_custom_op(op_name='mock_diffusion_cuda_crash', fake_impl=_fake_embedding)\\n"
|
||||||
|
"def mock_diffusion_cuda_crash(positions, cache):\\n"
|
||||||
|
" out = F.embedding(positions, cache)\\n"
|
||||||
|
" torch.cuda.synchronize()\\n"
|
||||||
|
" return out\\n\\n"
|
||||||
|
"cache = torch.randn(4, 64, device='cuda', dtype=torch.float16)\\n"
|
||||||
|
"positions = torch.tensor([0, 9], device='cuda', dtype=torch.long)\\n"
|
||||||
|
"mock_diffusion_cuda_crash(positions, cache)\\n"
|
||||||
|
)
|
||||||
|
PY
|
||||||
|
|
||||||
|
SGLANG_KERNEL_API_LOGLEVEL=1 \
|
||||||
|
SGLANG_KERNEL_API_LOGDEST=/tmp/sglang_diffusion_level1.log \
|
||||||
|
python3 /tmp/sglang_diffusion_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Try level 3:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
SGLANG_KERNEL_API_LOGLEVEL=3 \
|
||||||
|
SGLANG_KERNEL_API_LOGDEST=/tmp/sglang_diffusion_level3.log \
|
||||||
|
python3 /tmp/sglang_diffusion_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Try level 10:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
SGLANG_KERNEL_API_LOGLEVEL=10 \
|
||||||
|
SGLANG_KERNEL_API_LOGDEST=/tmp/sglang_diffusion_level10.log \
|
||||||
|
SGLANG_KERNEL_API_DUMP_DIR=/tmp/sglang_diffusion_level10_dumps \
|
||||||
|
python3 /tmp/sglang_diffusion_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
If your local environment has unrelated FlashInfer import issues, resolve them in the shell before running the example. The example itself does not set any `FLASHINFER_*` environment variable.
|
||||||
|
|
||||||
|
## Step 4: Multi-Process Debugging
|
||||||
|
|
||||||
|
When running with multiple GPUs or worker processes, use `%i` in the log path:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug_rank_%i.log
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=4 my_script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates separate logs such as:
|
||||||
|
- `debug_rank_12345.log`
|
||||||
|
- `debug_rank_12346.log`
|
||||||
|
- `debug_rank_12347.log`
|
||||||
|
- `debug_rank_12348.log`
|
||||||
|
|
||||||
|
Real multi-process example from a 2-GPU `Qwen/Qwen2.5-0.5B-Instruct` run:
|
||||||
|
|
||||||
|
```text
|
||||||
|
/tmp/sglang_kernel_api_validation_multi/qwen_qwen2_5_0_5b_instruct_level3_950201.log
|
||||||
|
/tmp/sglang_kernel_api_validation_multi/qwen_qwen2_5_0_5b_instruct_level3_950349.log
|
||||||
|
/tmp/sglang_kernel_api_validation_multi/qwen_qwen2_5_0_5b_instruct_level3_950350.log
|
||||||
|
/tmp/sglang_kernel_api_validation_multi/qwen_qwen2_5_0_5b_instruct_level3_950351.log
|
||||||
|
```
|
||||||
|
|
||||||
|
You should usually do the same for level-10 dump directories:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=10
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug_rank_%i.log
|
||||||
|
export SGLANG_KERNEL_API_DUMP_DIR=/tmp/sglang_kernel_api_dumps_%i
|
||||||
|
```
|
||||||
|
|
||||||
|
This avoids multiple ranks writing into the same dump directory tree.
|
||||||
|
|
||||||
|
## Step 5: Filter Level-10 Dumps
|
||||||
|
|
||||||
|
If level 10 is too noisy, restrict dumps to specific APIs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=10
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug.log
|
||||||
|
export SGLANG_KERNEL_API_DUMP_DIR=/tmp/sglang_kernel_api_dumps
|
||||||
|
export SGLANG_KERNEL_API_DUMP_INCLUDE='sglang.custom_op.*'
|
||||||
|
export SGLANG_KERNEL_API_DUMP_EXCLUDE='*.fake_impl'
|
||||||
|
```
|
||||||
|
|
||||||
|
`SGLANG_KERNEL_API_DUMP_INCLUDE` and `SGLANG_KERNEL_API_DUMP_EXCLUDE` use shell-style wildcard matching.
|
||||||
|
|
||||||
|
## Step 6: Common CUDA Errors and What to Check
|
||||||
|
|
||||||
|
### Illegal Memory Access or Device-Side Assert
|
||||||
|
|
||||||
|
**Typical errors**:
|
||||||
|
```
|
||||||
|
RuntimeError: CUDA error: an illegal memory access was encountered
|
||||||
|
torch.AcceleratorError: CUDA error: device-side assert triggered
|
||||||
|
```
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
```
|
||||||
|
|
||||||
|
Check in the logs:
|
||||||
|
- ✅ Tensor shapes
|
||||||
|
- ✅ Tensor dtypes
|
||||||
|
- ✅ CUDA vs CPU device placement
|
||||||
|
- ✅ Tensor stride / contiguity
|
||||||
|
- ✅ Whether the failing call has inputs logged but no outputs logged
|
||||||
|
|
||||||
|
Typical shape-mismatch pattern:
|
||||||
|
|
||||||
|
```text
|
||||||
|
SGLang Kernel API Call: ...
|
||||||
|
arg[0]=Tensor(shape=(..., 128), ...) # ✅ expected dimension
|
||||||
|
arg[1]=Tensor(shape=(..., 64), ...) # ❌ mismatch
|
||||||
|
```
|
||||||
|
|
||||||
|
This often points to head-dim, hidden-dim, or cache-layout mismatch rather than a random CUDA failure.
|
||||||
|
|
||||||
|
### NaN or Inf
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=5
|
||||||
|
```
|
||||||
|
|
||||||
|
Check:
|
||||||
|
- `min`
|
||||||
|
- `max`
|
||||||
|
- `mean`
|
||||||
|
- `nan_count`
|
||||||
|
- `inf_count`
|
||||||
|
|
||||||
|
Typical bad pattern:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Tensor(
|
||||||
|
...
|
||||||
|
min=-1234567.000000 # ❌ suspiciously large
|
||||||
|
max=9876543.000000 # ❌ suspiciously large
|
||||||
|
mean=nan # ❌ bad
|
||||||
|
nan_count=128 # ❌ found NaNs
|
||||||
|
inf_count=0 # ✅ no Infs here
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
This usually means the bad values were already present before the crashing kernel.
|
||||||
|
|
||||||
|
### Out of Memory
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
```
|
||||||
|
|
||||||
|
Check:
|
||||||
|
- Unexpectedly large tensor shapes
|
||||||
|
- Batch size
|
||||||
|
- Sequence length
|
||||||
|
- Frame count or image resolution in diffusion workloads
|
||||||
|
|
||||||
|
Also check whether a supposedly per-token or per-frame tensor accidentally became full-sequence or full-image sized.
|
||||||
|
|
||||||
|
Typical bad pattern:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Tensor(
|
||||||
|
shape=(1024, 8192, 128, 128) # ❌ way too large
|
||||||
|
...
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example: Spot a Shape Bug from the Log
|
||||||
|
|
||||||
|
Suppose the failing API log looks like this:
|
||||||
|
|
||||||
|
```text
|
||||||
|
[2026-03-19 00:47:30] SGLang Kernel API Call: RotaryEmbedding.forward
|
||||||
|
Positional input arguments:
|
||||||
|
arg[0]=Tensor(shape=(1, 8), dtype=torch.int64, ...)
|
||||||
|
arg[1]=Tensor(shape=(1, 8, 8, 256), dtype=torch.bfloat16, ...) # ✅ query
|
||||||
|
arg[2]=Tensor(shape=(1, 8, 4, 64), dtype=torch.bfloat16, ...) # ❌ key head_dim mismatch
|
||||||
|
```
|
||||||
|
|
||||||
|
What this tells you:
|
||||||
|
- ✅ positions look reasonable
|
||||||
|
- ✅ query looks plausible
|
||||||
|
- ❌ key last dimension is inconsistent with the expected rotary/head dimension
|
||||||
|
|
||||||
|
That usually means the bug is in projection layout, head packing, or cache format rather than in the rotary kernel itself.
|
||||||
|
|
||||||
|
## Step 7: Combine with compute-sanitizer
|
||||||
|
|
||||||
|
For harder bugs, combine kernel API logging with CUDA memory checking:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug.log
|
||||||
|
|
||||||
|
compute-sanitizer --tool memcheck python3 /tmp/sglang_llm_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `debug.log` to see the exact inputs that reached the crashing API boundary.
|
||||||
|
|
||||||
|
Typical `compute-sanitizer` output:
|
||||||
|
|
||||||
|
```text
|
||||||
|
========= COMPUTE-SANITIZER
|
||||||
|
========= Invalid __global__ write of size 4 bytes
|
||||||
|
========= at 0x1234 in SomeKernel
|
||||||
|
========= by thread (256,0,0) in block (10,0,0)
|
||||||
|
========= Address 0x... is out of bounds
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the sanitizer output to identify the failing kernel and use `debug.log` to identify the exact tensors that reached the API boundary right before it.
|
||||||
|
|
||||||
|
If you need more synchronous host-side error reporting, you can try `CUDA_LAUNCH_BLOCKING=1` as a separate follow-up experiment. It is not part of the default workflow because it changes execution timing and can hide concurrency-related behavior.
|
||||||
|
|
||||||
|
## Step 8: Combine with cuda-gdb
|
||||||
|
|
||||||
|
For crashes that need a stack trace instead of only memory diagnostics:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=debug.log
|
||||||
|
|
||||||
|
cuda-gdb --args python3 /tmp/sglang_llm_crash.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Inside `cuda-gdb`:
|
||||||
|
|
||||||
|
```text
|
||||||
|
(cuda-gdb) run
|
||||||
|
(cuda-gdb) where
|
||||||
|
```
|
||||||
|
|
||||||
|
Then correlate the backtrace with `debug.log`.
|
||||||
|
|
||||||
|
## Step 9: Kernel-Level Debugging with printf()
|
||||||
|
|
||||||
|
When you own the CUDA kernel, `printf()` is still useful for narrowing down bad indices, bad launch geometry, or broken state propagation.
|
||||||
|
|
||||||
|
Basic pattern:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
__global__ void MyKernel(const float* input, float* output, int n) {
|
||||||
|
int idx = blockIdx.x * blockDim.x + threadIdx.x;
|
||||||
|
|
||||||
|
if (threadIdx.x == 0 && blockIdx.x == 0) {
|
||||||
|
printf("n=%d input0=%f\n", n, input[0]);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (idx < n) {
|
||||||
|
output[idx] = input[idx] * 2.0f;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
After launch, force the output to flush:
|
||||||
|
|
||||||
|
```python
|
||||||
|
my_kernel(...)
|
||||||
|
torch.cuda.synchronize()
|
||||||
|
```
|
||||||
|
|
||||||
|
For warp-specialized kernels, do not blindly print only on `threadIdx.x == 0`. Pick one representative thread per warp or per specialization group instead.
|
||||||
|
|
||||||
|
### Warp-Specialized Kernels: Choosing the Right Print Thread
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
- `threadIdx.x == 0` only prints from the first warp in the block
|
||||||
|
- for warp-specialized kernels, that often misses the warp or group that is actually wrong
|
||||||
|
|
||||||
|
Better pattern:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
__global__ void WarpSpecializedKernel(...) {
|
||||||
|
// Example: first lane of each warp
|
||||||
|
if ((threadIdx.x % 32) == 0) {
|
||||||
|
printf("warp=%d\n", threadIdx.x / 32);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Or, if the kernel is organized in larger specialization groups, print once per group instead of once per block.
|
||||||
|
|
||||||
|
Common mistake:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
// Only warp 0 prints
|
||||||
|
if (threadIdx.x == 0) {
|
||||||
|
printf("warp=%d\n", threadIdx.x / 32);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Reference
|
||||||
|
|
||||||
|
| Kernel Type | Print Condition | Notes |
|
||||||
|
|----------|----------|-------------|
|
||||||
|
| Simple kernel | `threadIdx.x == 0` | One thread per block is usually enough |
|
||||||
|
| Warp-specialized kernel | one representative lane per warp | e.g. `threadIdx.x % 32 == 0` |
|
||||||
|
| Group-specialized kernel | one representative lane per group | choose based on the kernel's scheduling layout |
|
||||||
|
|
||||||
|
### Other Kernel Debugging Tools
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
assert(value >= 0.0f && "value must be non-negative");
|
||||||
|
static_assert(BLOCK_SIZE % 32 == 0, "BLOCK_SIZE must be warp aligned");
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Variables Reference
|
||||||
|
|
||||||
|
| Variable | Values | Description |
|
||||||
|
|----------|--------|-------------|
|
||||||
|
| `SGLANG_KERNEL_API_LOGLEVEL` | `0` | No logging (default) |
|
||||||
|
| | `1` | Function names only |
|
||||||
|
| | `3` | Inputs and outputs with metadata |
|
||||||
|
| | `5` | Level 3 plus tensor statistics |
|
||||||
|
| | `10` | Level 5 plus crash-safe tensor dumps |
|
||||||
|
| `SGLANG_KERNEL_API_LOGDEST` | `stdout` | Log to stdout |
|
||||||
|
| | `stderr` | Log to stderr |
|
||||||
|
| | `<path>` | Log to file |
|
||||||
|
| | `log_%i.txt` | `%i` expands to process ID |
|
||||||
|
| `SGLANG_KERNEL_API_DUMP_DIR` | `<path>` | Directory for level-10 dumps |
|
||||||
|
| `SGLANG_KERNEL_API_DUMP_INCLUDE` | wildcard list | Only dump matching API names |
|
||||||
|
| `SGLANG_KERNEL_API_DUMP_EXCLUDE` | wildcard list | Skip matching API names |
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### 1. Start with Level 3
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
```
|
||||||
|
|
||||||
|
Level 3 is usually enough to catch wrong shapes, wrong dtypes, and wrong devices.
|
||||||
|
|
||||||
|
### 2. Use Level 5 for Numerical Issues
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=5
|
||||||
|
```
|
||||||
|
|
||||||
|
Use it when you suspect NaN or Inf values.
|
||||||
|
|
||||||
|
### 3. Use Level 10 for Crash Reproduction
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=10
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the most useful mode when the process crashes before you can inspect live tensors.
|
||||||
|
|
||||||
|
If you need successful input/output dumps from a real model run, temporarily disable CUDA graph for that debug session.
|
||||||
|
|
||||||
|
When level 10 is too noisy, pair it with `SGLANG_KERNEL_API_DUMP_INCLUDE` / `SGLANG_KERNEL_API_DUMP_EXCLUDE` instead of dumping every covered API.
|
||||||
|
|
||||||
|
### 4. Log to File for Crashes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGDEST=crash.log
|
||||||
|
```
|
||||||
|
|
||||||
|
File logs are safer than stdout when the process aborts.
|
||||||
|
|
||||||
|
### 5. Disable Logging in Production
|
||||||
|
|
||||||
|
```bash
|
||||||
|
unset SGLANG_KERNEL_API_LOGLEVEL
|
||||||
|
```
|
||||||
|
|
||||||
|
When disabled, the decorator returns the original callable and adds no runtime logging overhead.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### No Logs Appear
|
||||||
|
|
||||||
|
Check:
|
||||||
|
1. `echo $SGLANG_KERNEL_API_LOGLEVEL`
|
||||||
|
2. `echo $SGLANG_KERNEL_API_LOGDEST`
|
||||||
|
3. Whether the failing path goes through a covered API boundary
|
||||||
|
|
||||||
|
### Too Much Output
|
||||||
|
|
||||||
|
Reduce the level:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export SGLANG_KERNEL_API_LOGLEVEL=3
|
||||||
|
```
|
||||||
|
|
||||||
|
### Statistics Are Skipped During CUDA Graph Capture
|
||||||
|
|
||||||
|
If you see:
|
||||||
|
```text
|
||||||
|
statistics=[skipped: CUDA graph capture in progress]
|
||||||
|
```
|
||||||
|
|
||||||
|
That is expected. Level-5 statistics are intentionally skipped during CUDA graph capture to avoid synchronization side effects.
|
||||||
|
|
||||||
|
### Tensor Dumps Are Skipped During CUDA Graph Capture
|
||||||
|
|
||||||
|
If you see:
|
||||||
|
```text
|
||||||
|
Tensor dump skipped: CUDA graph capture in progress
|
||||||
|
```
|
||||||
|
|
||||||
|
That is also expected. Level-10 dumps require copying tensors to CPU, which is not allowed during CUDA graph capture.
|
||||||
141
third_party/sglang/.claude/skills/generate-profile/SKILL.md
vendored
Normal file
141
third_party/sglang/.claude/skills/generate-profile/SKILL.md
vendored
Normal file
@@ -0,0 +1,141 @@
|
|||||||
|
---
|
||||||
|
name: generate-profile
|
||||||
|
description: Generate an e2e profiling trace of an SGLang server run. Launches a server, validates accuracy, captures a Chrome-compatible trace, and returns the profile path.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Generate an E2E Profile of an SGLang Server Run
|
||||||
|
|
||||||
|
This skill launches an SGLang server, validates it with a quick accuracy test, generates a profiling trace, and returns the profile file path.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- A working SGLang installation (`pip install -e .` or equivalent)
|
||||||
|
- At least one available CUDA GPU
|
||||||
|
|
||||||
|
## Step-by-step Workflow
|
||||||
|
|
||||||
|
### Step 1: Launch the server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CUDA_VISIBLE_DEVICES=<gpu_id> sglang serve --model-path <model> --port <port> &
|
||||||
|
```
|
||||||
|
|
||||||
|
- Default model: `Qwen/Qwen3-8B` (good balance of speed and quality)
|
||||||
|
- Default port: `30000`
|
||||||
|
- The server runs in the background. Save the PID for cleanup.
|
||||||
|
- Use the GPU specified by the user's preferences (check memory files for GPU preferences).
|
||||||
|
|
||||||
|
### Step 2: Wait for server readiness
|
||||||
|
|
||||||
|
Poll the health endpoint until the server is ready:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for i in $(seq 1 120); do
|
||||||
|
if curl -s http://127.0.0.1:<port>/health 2>/dev/null | grep -q "ok\|healthy"; then
|
||||||
|
echo "Server ready"
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
sleep 5
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
The server prints **"The server is fired up and ready to roll!"** to stdout when ready. The health endpoint returns 200 once the server can accept requests.
|
||||||
|
|
||||||
|
Typical startup time: 30-90 seconds depending on model size and whether CUDA graphs are being compiled.
|
||||||
|
|
||||||
|
### Step 3: Validate accuracy (sanity check)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m sglang.test.few_shot_gsm8k --num-q 20
|
||||||
|
```
|
||||||
|
|
||||||
|
- Expected accuracy: **> 0.8** for capable models (Qwen3-8B, Llama-3.1-8B-Instruct, etc.)
|
||||||
|
- This is a quick sanity check, not a rigorous benchmark.
|
||||||
|
- If accuracy is unexpectedly low, something is wrong — do not proceed to profiling.
|
||||||
|
|
||||||
|
### Step 4: Generate the profile
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m sglang.test.send_one --profile
|
||||||
|
```
|
||||||
|
|
||||||
|
This command:
|
||||||
|
1. Sends a request to the server
|
||||||
|
2. Triggers the profiler for 5 steps (default)
|
||||||
|
3. Generates a trace file under `/tmp/<timestamp>/`
|
||||||
|
4. The trace directory contains:
|
||||||
|
- `<timestamp>-TP-0.trace.json.gz` — Chrome trace format (open in `chrome://tracing` or Perfetto)
|
||||||
|
- `server_args.json` — the server configuration used
|
||||||
|
|
||||||
|
**Output format:**
|
||||||
|
```
|
||||||
|
Dump profiling traces to /tmp/<timestamp>
|
||||||
|
```
|
||||||
|
|
||||||
|
The profile path is printed to stdout. Parse it from the output.
|
||||||
|
|
||||||
|
**Optional flags:**
|
||||||
|
- `--profile-steps N` — number of profiling steps (default: 5)
|
||||||
|
- `--profile-by-stage` — profile by stage (prefill/decode separately)
|
||||||
|
- `--profile-prefix <path>` — custom output prefix
|
||||||
|
|
||||||
|
### Step 5: Kill the server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pkill -9 -f "sglang.launch_server\|sglang serve\|sglang.srt"
|
||||||
|
```
|
||||||
|
|
||||||
|
Wait a moment and verify no sglang processes remain:
|
||||||
|
```bash
|
||||||
|
sleep 2 && pgrep -af "sglang serve" || echo "Server killed"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 6: Report the profile path
|
||||||
|
|
||||||
|
Return the profile directory path (e.g., `/tmp/1773999986.4769795`) and list its contents so the user knows what files were generated.
|
||||||
|
|
||||||
|
## Example Full Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Launch server
|
||||||
|
source cleanup/bin/activate
|
||||||
|
CUDA_VISIBLE_DEVICES=1 sglang serve --model-path Qwen/Qwen3-8B --port 30000 &
|
||||||
|
|
||||||
|
# 2. Wait for ready
|
||||||
|
for i in $(seq 1 120); do
|
||||||
|
curl -s http://127.0.0.1:30000/health | grep -q "ok" && break
|
||||||
|
sleep 5
|
||||||
|
done
|
||||||
|
|
||||||
|
# 3. Accuracy check
|
||||||
|
python3 -m sglang.test.few_shot_gsm8k --num-q 20
|
||||||
|
# Expected: Accuracy > 0.8
|
||||||
|
|
||||||
|
# 4. Profile
|
||||||
|
python3 -m sglang.test.send_one --profile
|
||||||
|
# Output: "Dump profiling traces to /tmp/1773999986.4769795"
|
||||||
|
|
||||||
|
# 5. Cleanup
|
||||||
|
pkill -9 -f "sglang.launch_server\|sglang serve\|sglang.srt"
|
||||||
|
sleep 2
|
||||||
|
|
||||||
|
# 6. Check output
|
||||||
|
ls -la /tmp/1773999986.4769795/
|
||||||
|
# 1773999986.4851577-TP-0.trace.json.gz (Chrome trace)
|
||||||
|
# server_args.json (server config)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Customization
|
||||||
|
|
||||||
|
- **Different port**: Pass `--port <port>` and use `--host 127.0.0.1 --port <port>` for test commands
|
||||||
|
- **Multi-GPU**: Use `--tp <N>` for tensor parallelism; trace files will be generated per TP rank
|
||||||
|
- **Longer profile**: Use `--profile-steps 10` for more steps in the trace
|
||||||
|
- **Stage profiling**: Use `--profile-by-stage` to separate prefill and decode phases
|
||||||
|
|
||||||
|
## Viewing the Profile
|
||||||
|
|
||||||
|
Open the `.trace.json.gz` file in:
|
||||||
|
- **Perfetto UI**: https://ui.perfetto.dev/ (drag and drop the file)
|
||||||
|
- **Chrome tracing**: `chrome://tracing` (load the file)
|
||||||
|
|
||||||
|
Both support the gzipped Chrome trace format natively.
|
||||||
219
third_party/sglang/.claude/skills/sglang-bisect-ci-regression/SKILL.md
vendored
Normal file
219
third_party/sglang/.claude/skills/sglang-bisect-ci-regression/SKILL.md
vendored
Normal file
@@ -0,0 +1,219 @@
|
|||||||
|
# SGLang Bisect CI Regression
|
||||||
|
|
||||||
|
Investigate a consistently failing CI test to find the root cause - whether it's a code regression from a specific PR, a hardware/runner-specific issue, or an environment change. Optionally reproduce the failure on a remote GPU server.
|
||||||
|
|
||||||
|
## Slash Command
|
||||||
|
|
||||||
|
`/sglang-bisect-ci-regression <test_name_or_ci_url> [ssh_target] [docker_container]`
|
||||||
|
|
||||||
|
## When to Use This Skill
|
||||||
|
|
||||||
|
- A CI test is failing consistently on main (scheduled runs)
|
||||||
|
- You need to find which PR introduced a regression
|
||||||
|
- You suspect a runner-specific or GPU-specific issue
|
||||||
|
- You want to reproduce a CI failure on a remote server
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- **First argument (required)**: Test file name (e.g. `test_lora_tp.py`) or a GitHub Actions job URL
|
||||||
|
- **Second argument (optional)**: SSH target for remote reproduction (e.g. `user@host`)
|
||||||
|
- **Third argument (optional)**: Docker container name on the SSH target (e.g. `sglang_dev`)
|
||||||
|
|
||||||
|
If SSH target and docker container are not provided, the skill will only perform the CI log analysis and bisection, without remote reproduction. **Ask the user** for these if reproduction is needed and they weren't provided.
|
||||||
|
|
||||||
|
## Background: Scheduled CI Runs
|
||||||
|
|
||||||
|
SGLang uses the `pr-test.yml` workflow with **scheduled runs** (cron-triggered) to periodically test the `main` branch. These runs are the primary data source for detecting regressions:
|
||||||
|
|
||||||
|
- **Workflow**: `pr-test.yml` with `event: schedule`
|
||||||
|
- **Branch**: `main`
|
||||||
|
- **Dashboard**: https://github.com/sgl-project/sglang/actions/workflows/pr-test.yml?query=event%3Aschedule
|
||||||
|
- **Frequency**: Runs multiple times daily, each pinned to the HEAD of `main` at trigger time
|
||||||
|
- **Purpose**: Catches regressions that slip through PR-level CI (e.g., interaction bugs between merged PRs, hardware-specific issues)
|
||||||
|
|
||||||
|
Always use these scheduled runs (not PR-triggered runs) when bisecting regressions on `main`. The `--event schedule` filter in `gh run list` ensures you only see these periodic main-branch runs.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### Phase 1: Extract the Failure Signature
|
||||||
|
|
||||||
|
1. **Get the failing test details from CI logs.** If given a URL, fetch logs directly. If given a test name, find recent scheduled runs of `pr-test.yml` on `main` that failed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List recent scheduled runs targeting main (the primary source of truth for regressions)
|
||||||
|
# These are cron-triggered runs visible at:
|
||||||
|
# https://github.com/sgl-project/sglang/actions/workflows/pr-test.yml?query=event%3Aschedule
|
||||||
|
gh run list --repo sgl-project/sglang --workflow="pr-test.yml" --event schedule --branch main --limit 20 --json databaseId,conclusion,createdAt,headSha
|
||||||
|
|
||||||
|
# Find the job containing the test
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name, conclusion, databaseId}'
|
||||||
|
|
||||||
|
# Get the failure details
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --job {JOB_ID} --log 2>&1 | grep -E -B 5 -A 30 "AssertionError|FAIL|Error|{TEST_NAME}"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Record the failure signature:**
|
||||||
|
- Exact error message and assertion
|
||||||
|
- Affected test method name
|
||||||
|
- Model/config involved
|
||||||
|
- Numeric values (e.g., tolerance diffs, scores)
|
||||||
|
- Whether the failure is deterministic (same values across runs)
|
||||||
|
|
||||||
|
### Phase 2: Temporal Bisection
|
||||||
|
|
||||||
|
3. **Find the boundary between passing and failing runs.** Walk through the scheduled run history (from the `pr-test.yml` schedule runs on `main`) to identify:
|
||||||
|
- Last known PASSING run (sha + date)
|
||||||
|
- First known FAILING run (sha + date)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# For each scheduled run, check the specific partition/job status
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --json jobs --jq '.jobs[] | select(.name == "{JOB_NAME}") | {conclusion, databaseId}'
|
||||||
|
|
||||||
|
# Verify a specific test passed or failed in a run
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --job {JOB_ID} --log 2>&1 | grep -E "{TEST_NAME}|PASSED|FAILED|logprobs mismatch" | head -10
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **List commits between the boundary:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git log --oneline {LAST_PASS_SHA}..{FIRST_FAIL_SHA}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Filter for relevant commits** that touch files related to the failing test (model layers, kernels, test utilities, etc.):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git log --oneline {LAST_PASS_SHA}..{FIRST_FAIL_SHA} -- {relevant_paths}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: Runner/Hardware Analysis
|
||||||
|
|
||||||
|
6. **Check if the failure is runner-specific.** Extract the runner identity from each failing and passing run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get runner name and machine
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --job {JOB_ID} --log 2>&1 | grep -E "Runner name|Machine name" | head -5
|
||||||
|
|
||||||
|
# Get GPU/driver info
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --job {JOB_ID} --log 2>&1 | grep -i -E "NVIDIA-SMI|Driver Version|CUDA Version" | head -5
|
||||||
|
|
||||||
|
# Get package versions
|
||||||
|
gh run view {RUN_ID} --repo sgl-project/sglang --job {JOB_ID} --log 2>&1 | grep -E "sgl.kernel.*==|flashinfer.*==" | head -5
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Correlate runners with pass/fail outcomes.** Build a table:
|
||||||
|
|
||||||
|
| Run ID | Date | Runner | GPU Type | Driver | Result |
|
||||||
|
|--------|------|--------|----------|--------|--------|
|
||||||
|
|
||||||
|
If all failures map to a specific runner type/GPU and all passes map to another, the issue is **hardware-specific**, not a code regression.
|
||||||
|
|
||||||
|
### Phase 4: Code Analysis
|
||||||
|
|
||||||
|
8. **If a code regression is suspected** (failures not runner-specific), examine the candidate commits:
|
||||||
|
- Read the changed files
|
||||||
|
- Understand how the changes could affect the failing test
|
||||||
|
- Look for prefill-vs-decode differences, TP-specific paths, kernel changes
|
||||||
|
|
||||||
|
9. **If a hardware issue is suspected**, analyze:
|
||||||
|
- Kernel compatibility (CUDA compute capability)
|
||||||
|
- Driver version differences
|
||||||
|
- All-reduce / NCCL behavior differences
|
||||||
|
- CUDA graph capture differences across GPU architectures
|
||||||
|
|
||||||
|
### Phase 5: Remote Reproduction (Optional)
|
||||||
|
|
||||||
|
Only if SSH target and docker container were provided.
|
||||||
|
|
||||||
|
10. **Verify the remote environment:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh {SSH_TARGET} "docker exec {CONTAINER} nvidia-smi --query-gpu=name,driver_version --format=csv"
|
||||||
|
ssh {SSH_TARGET} "docker exec {CONTAINER} pip show sgl-kernel sglang flashinfer-python 2>&1 | grep -E 'Name:|Version:'"
|
||||||
|
```
|
||||||
|
|
||||||
|
11. **Ensure latest code is installed.** If the container is stale, update:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Try fetching latest main
|
||||||
|
ssh {SSH_TARGET} "docker exec {CONTAINER} bash -c 'cd /path/to/sglang && git fetch origin main && git checkout origin/main'"
|
||||||
|
# Or download and install from tarball if git auth fails
|
||||||
|
ssh {SSH_TARGET} "docker exec {CONTAINER} bash -c 'cd /tmp && curl -L https://github.com/sgl-project/sglang/archive/refs/heads/main.tar.gz | tar xz && cd sglang-main && pip install -e \"python[all]\"'"
|
||||||
|
# Reinstall (after git fetch)
|
||||||
|
ssh {SSH_TARGET} "docker exec {CONTAINER} bash -c 'cd /path/to/sglang && pip install -e \"python[all]\"'"
|
||||||
|
# Install test dependencies if needed
|
||||||
|
ssh {SSH_TARGET} "docker exec {CONTAINER} pip install peft rouge-score"
|
||||||
|
```
|
||||||
|
|
||||||
|
12. **Create a minimal reproduction script** that:
|
||||||
|
- Uses `if __name__ == '__main__'` with `mp.set_start_method("spawn")`
|
||||||
|
- Runs the specific failing test configuration
|
||||||
|
- Prints key metrics (diffs, scores, outputs)
|
||||||
|
- Exits with code 1 on failure
|
||||||
|
|
||||||
|
13. **Copy and run the reproduction script:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scp /tmp/repro_script.py {SSH_TARGET}:/tmp/
|
||||||
|
ssh {SSH_TARGET} "docker cp /tmp/repro_script.py {CONTAINER}:/tmp/"
|
||||||
|
ssh {SSH_TARGET} "docker exec -e CUDA_VISIBLE_DEVICES=0,1 {CONTAINER} python3 /tmp/repro_script.py"
|
||||||
|
```
|
||||||
|
|
||||||
|
14. **Run control experiments** to isolate the variable:
|
||||||
|
- If suspecting TP issue: run with TP=1 as control
|
||||||
|
- If suspecting GPU issue: compare same code on different GPU
|
||||||
|
- If suspecting a specific commit: test before/after that commit
|
||||||
|
|
||||||
|
### Phase 6: Report
|
||||||
|
|
||||||
|
15. **Produce a structured report:**
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## CI Regression Bisection Report
|
||||||
|
|
||||||
|
### Failure Signature
|
||||||
|
- **Test**: {test_file}::{test_method}
|
||||||
|
- **Error**: {exact error message}
|
||||||
|
- **Key metrics**: {numeric values}
|
||||||
|
- **Deterministic**: Yes/No
|
||||||
|
|
||||||
|
### Root Cause Classification
|
||||||
|
One of:
|
||||||
|
- **Code Regression**: PR #{number} introduced the bug
|
||||||
|
- **Hardware-Specific**: Fails on {GPU_TYPE}, passes on others
|
||||||
|
- **Environment Change**: New runner/driver/package version
|
||||||
|
- **Pre-existing Flakiness**: Intermittent, not a new regression
|
||||||
|
|
||||||
|
### Evidence
|
||||||
|
| Condition | Result |
|
||||||
|
|-----------|--------|
|
||||||
|
| {condition1} | PASS/FAIL |
|
||||||
|
| {condition2} | PASS/FAIL |
|
||||||
|
|
||||||
|
### Timeline
|
||||||
|
- {date}: Last known pass ({sha}, {runner})
|
||||||
|
- {date}: First known fail ({sha}, {runner})
|
||||||
|
- {date}: Confirmed reproduction on {server}
|
||||||
|
|
||||||
|
### Recommended Fix
|
||||||
|
- **Short-term**: {workaround}
|
||||||
|
- **Long-term**: {proper fix}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Patterns to Recognize
|
||||||
|
|
||||||
|
| Pattern | Diagnosis |
|
||||||
|
|---------|-----------|
|
||||||
|
| Same SHA passes on runner A, fails on runner B | Hardware/runner-specific |
|
||||||
|
| All runners fail after commit X | Code regression from commit X |
|
||||||
|
| Intermittent - same runner sometimes passes/fails | Flaky test or race condition |
|
||||||
|
| Prefill OK but decode fails | TP/all-reduce issue in decode path |
|
||||||
|
| Works with TP=1, fails with TP>1 | Tensor parallelism bug |
|
||||||
|
| Exact same numeric diff every time | Deterministic bug, not flakiness |
|
||||||
|
|
||||||
|
## Important Notes
|
||||||
|
|
||||||
|
- **Always check runner identity** before concluding it's a code regression. Many "consistent" failures are actually runner-specific.
|
||||||
|
- **Test partition assignments change over time** as tests are added/removed. A test may move between partitions, landing on different runner types.
|
||||||
|
- **H200 runners** use `/root/actions-runner/` path and machine names like `gpu-h200-worker-*`. Non-H200 runners use `/public_sglang_ci/runner-*` paths.
|
||||||
|
- When running remote reproduction, use `run_in_background` for long-running tests and check output with `TaskOutput`.
|
||||||
|
- Container environments may be stale - always verify package versions match CI before drawing conclusions.
|
||||||
444
third_party/sglang/.claude/skills/write-sglang-test/SKILL.md
vendored
Normal file
444
third_party/sglang/.claude/skills/write-sglang-test/SKILL.md
vendored
Normal file
@@ -0,0 +1,444 @@
|
|||||||
|
---
|
||||||
|
name: write-sglang-test
|
||||||
|
description: Guide for writing SGLang CI/UT tests. Covers CustomTestCase, CI registration, server fixtures, model selection, mock testing, and test placement. Always read test/README.md for the full CI layout, how to run tests, and extra tips. Use when creating new tests, adding CI test cases, writing unit tests, or when the user asks to add tests for SGLang features.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Writing SGLang CI / UT Tests
|
||||||
|
|
||||||
|
This skill covers **how to write and register tests**. For CI pipeline internals (stage ordering, fast-fail, gating, partitioning, debugging CI failures), see the [CI workflow guide](../ci-workflow-guide/SKILL.md).
|
||||||
|
|
||||||
|
## Core Rules
|
||||||
|
|
||||||
|
1. **Always use `CustomTestCase`** — never raw `unittest.TestCase`. It ensures `tearDownClass` runs even when `setUpClass` fails, preventing resource leaks in CI.
|
||||||
|
2. **`tearDownClass` must be defensive** — use `hasattr`/null checks before accessing resources (e.g. `cls.process`) that `setUpClass` may not have finished allocating.
|
||||||
|
3. **Place tests in `test/registered/<category>/`** — except JIT kernel tests and benchmarks, which live in `python/sglang/jit_kernel/tests/` and `python/sglang/jit_kernel/benchmark/` (nested subfolders are allowed)
|
||||||
|
4. **Reuse server fixtures** — inherit from `DefaultServerBase` or write `setUpClass`/`tearDownClass` with `popen_launch_server`
|
||||||
|
5. **Prefer mock over real server** — when testing logic that doesn't need a server / engine launch (middleware, request routing, config validation, argument parsing), use `unittest.mock.patch` / `MagicMock` and place tests in `test/registered/unit/`. Only launch a real server when the test genuinely needs inference results or server lifecycle behavior.
|
||||||
|
|
||||||
|
JIT kernel exception:
|
||||||
|
- If the task is adding or updating code under `python/sglang/jit_kernel/`, prefer the `add-jit-kernel` skill first.
|
||||||
|
- JIT kernel correctness tests use `python/sglang/jit_kernel/tests/**/test_*.py`.
|
||||||
|
- JIT kernel benchmarks use `python/sglang/jit_kernel/benchmark/**/bench_*.py`.
|
||||||
|
- Those files are still executed by `test/run_suite.py`, but through dedicated kernel suites rather than `test/registered/`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model & Backend Selection
|
||||||
|
|
||||||
|
| Scenario | Model | CI Registration | Suite |
|
||||||
|
|----------|-------|-----------------|-------|
|
||||||
|
| **Unit tests** (no server / engine launch) | None | `register_cpu_ci` (prefer) or `register_cuda_ci` | `stage-a-test-cpu` or `stage-b-test-1-gpu-small` |
|
||||||
|
| **Common / backend-independent** (middleware, abort, routing, config, arg parsing) | `DEFAULT_SMALL_MODEL_NAME_FOR_TEST` (1B) | `register_cuda_ci` only | `stage-b-test-1-gpu-small` |
|
||||||
|
| **Model-agnostic functionality** (sampling, session, OpenAI API features) | `DEFAULT_SMALL_MODEL_NAME_FOR_TEST` (1B) | `register_cuda_ci` (+ AMD if relevant) | `stage-b-test-1-gpu-small` |
|
||||||
|
| **General performance** (single node, no spec/DP/parallelism) | `DEFAULT_MODEL_NAME_FOR_TEST` (8B) | `register_cuda_ci` | `stage-b-test-1-gpu-large` |
|
||||||
|
| **Bigger features** (spec, DP, TP, disaggregation) | Case by case | Case by case | See suite table below |
|
||||||
|
|
||||||
|
**Key principle for E2E tests**: Do NOT add `register_amd_ci` unless the test specifically exercises AMD/ROCm code paths. Common E2E tests just need any GPU to run — duplicating across backends wastes CI time with no extra coverage.
|
||||||
|
|
||||||
|
### All model constants
|
||||||
|
|
||||||
|
Defined in `python/sglang/test/test_utils.py`:
|
||||||
|
|
||||||
|
| Constant | Model | When to use |
|
||||||
|
|----------|-------|-------------|
|
||||||
|
| `DEFAULT_SMALL_MODEL_NAME_FOR_TEST` | Llama-3.2-1B-Instruct | Common features, model-agnostic tests |
|
||||||
|
| `DEFAULT_SMALL_MODEL_NAME_FOR_TEST_BASE` | Llama-3.2-1B | Base (non-instruct) model tests |
|
||||||
|
| `DEFAULT_MODEL_NAME_FOR_TEST` | Llama-3.1-8B-Instruct | General performance (single node) |
|
||||||
|
| `DEFAULT_MOE_MODEL_NAME_FOR_TEST` | Mixtral-8x7B-Instruct | MoE-specific tests |
|
||||||
|
| `DEFAULT_SMALL_EMBEDDING_MODEL_NAME_FOR_TEST` | — | Embedding tests |
|
||||||
|
| `DEFAULT_SMALL_VLM_MODEL_NAME_FOR_TEST` | — | Vision-language tests |
|
||||||
|
|
||||||
|
### Naming Conventions
|
||||||
|
|
||||||
|
- **Suite**: `stage-{a,b,c}-test-{gpu_count}-gpu-{hardware}` (e.g., `stage-b-test-1-gpu-small`)
|
||||||
|
- **CI runner**: `{gpu_count}-gpu-{hardware}` (e.g., `1-gpu-5090`, `4-gpu-h100`, `8-gpu-h200`)
|
||||||
|
|
||||||
|
### All CI Suites
|
||||||
|
|
||||||
|
#### Per-commit (CUDA)
|
||||||
|
|
||||||
|
| Suite | Runner (label) | Description |
|
||||||
|
|-------|----------------|-------------|
|
||||||
|
| `stage-a-test-1-gpu-small` | `1-gpu-5090` | Quick checks on a small NVIDIA GPU before heavier stages |
|
||||||
|
| `stage-a-test-cpu` | `ubuntu-latest` | CPU-only unit tests |
|
||||||
|
| `stage-b-test-1-gpu-small` | `1-gpu-5090` | Core engine tests that fit a 5090-class card |
|
||||||
|
| `stage-b-test-1-gpu-large` | `1-gpu-h100` | Tests that need H100-class memory or kernels (e.g. FA3) |
|
||||||
|
| `stage-b-test-2-gpu-large` | `2-gpu-h100` | Two-GPU correctness and parallelism (TP/PP) on H100 |
|
||||||
|
| `stage-b-test-4-gpu-b200` | `4-gpu-b200` | Early Blackwell coverage (SM100+ paths) on four GPUs |
|
||||||
|
| `stage-b-kernel-unit-1-gpu-large` | `1-gpu-h100` | JIT kernel correctness tests under `python/sglang/jit_kernel/tests/` |
|
||||||
|
| `stage-b-kernel-unit-8-gpu-h200` | `8-gpu-h200` | Multi-GPU JIT kernel correctness tests under `python/sglang/jit_kernel/tests/` |
|
||||||
|
| `stage-b-kernel-benchmark-1-gpu-large` | `1-gpu-h100` | JIT kernel benchmark files under `python/sglang/jit_kernel/benchmark/` |
|
||||||
|
| `stage-c-test-4-gpu-h100` | `4-gpu-h100` | Large 4-GPU H100 integration and scaling tests |
|
||||||
|
| `stage-c-test-8-gpu-h200` | `8-gpu-h200` | Large 8-GPU H200 runs for big models and parallelism |
|
||||||
|
| `stage-c-test-8-gpu-h20` | `8-gpu-h20` | Large 8-GPU H20 runs for big models |
|
||||||
|
| `stage-c-test-deepep-4-gpu-h100` | `4-gpu-h100` | DeepEP expert-parallel and networking on four H100s |
|
||||||
|
| `stage-c-test-deepep-8-gpu-h200` | `8-gpu-h200` | DeepEP at 8-GPU H200 scale |
|
||||||
|
| `stage-c-test-8-gpu-b200` | `8-gpu-b200` | 8-GPU B200 suite (registered but not yet wired to a workflow) |
|
||||||
|
| `stage-c-test-4-gpu-b200` | `4-gpu-b200` | 4-GPU B200 suite for large models on Blackwell |
|
||||||
|
| `stage-c-test-4-gpu-gb200` | `4-gpu-gb200` | 4-GPU GB200 suite for large models on Grace Blackwell |
|
||||||
|
|
||||||
|
#### Per-commit (AMD)
|
||||||
|
|
||||||
|
| Suite | Runner (label) | Description |
|
||||||
|
|-------|----------------|-------------|
|
||||||
|
| `stage-a-test-1-gpu-small-amd` | `linux-mi325-1gpu-sglang` | Quick checks on one MI325-class GPU |
|
||||||
|
| `stage-b-test-1-gpu-small-amd` | `linux-mi325-1gpu-sglang` | Core 1-GPU AMD tests (14 partitions) |
|
||||||
|
| `stage-b-test-1-gpu-small-amd-nondeterministic` | `linux-mi325-1gpu-sglang` | Non-deterministic 1-GPU AMD tests |
|
||||||
|
| `stage-b-test-1-gpu-small-amd-mi35x` | `linux-mi35x-gpu-1` | 1-GPU tests on MI35x hardware |
|
||||||
|
| `stage-b-test-1-gpu-large-amd` | `linux-mi325-1gpu-sglang` | Large 1-GPU AMD tests (2 partitions) |
|
||||||
|
| `stage-b-test-2-gpu-large-amd` | `linux-mi325-2gpu-sglang` | 2-GPU ROCm correctness and parallel setups |
|
||||||
|
| `stage-b-test-large-8-gpu-35x-disaggregation-amd` | `linux-mi35x-gpu-8.fabric` | PD disaggregation and RDMA on 8×MI35x fabric |
|
||||||
|
| `stage-c-test-4-gpu-amd` | `linux-mi325-4gpu-sglang` | 4-GPU AMD integration (2 partitions) |
|
||||||
|
| `stage-c-test-large-8-gpu-amd` | `linux-mi325-8gpu-sglang` | 8-GPU MI325 scaling and integration |
|
||||||
|
| `stage-c-test-large-8-gpu-amd-mi35x` | `linux-mi35x-gpu-8` | 8-GPU MI35x scaling (2 partitions) |
|
||||||
|
|
||||||
|
|
||||||
|
### Per-commit (Ascend NPU)
|
||||||
|
|
||||||
|
| Suite | Runner (label) | Description |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `per-commit-1-npu-a2` | `linux-aarch64-a2-1` | 1-NPU LLM CI machine |
|
||||||
|
| `per-commit-2-npu-a2` | `linux-aarch64-a2-2` | 2-NPU LLM CI machine |
|
||||||
|
| `per-commit-4-npu-a3` | `linux-aarch64-a3-4` | 4-NPU LLM CI machine |
|
||||||
|
| `per-commit-16-npu-a3` | `linux-aarch64-a3-16` | 16-NPU LLM CI machine |
|
||||||
|
| `multimodal-gen-test-1-npu-a3` | `linux-aarch64-a3-2` | 1-NPU multimodal CI machine |
|
||||||
|
| `multimodal-gen-test-2-npu-a3` | `linux-aarch64-a3-16` | 2-NPU multimodal CI machine |
|
||||||
|
| `multimodal-gen-test-8-npu-a3` | `linux-aarch64-a3-16` | 8-NPU multimodal CI machine |
|
||||||
|
|
||||||
|
#### Nightly
|
||||||
|
|
||||||
|
Nightly suites are listed in `NIGHTLY_SUITES` in [`test/run_suite.py`](../../../test/run_suite.py). They run via `nightly-test-nvidia.yml`, `nightly-test-amd.yml` amd `nightly-test-npu.yml`, not `pr-test.yml`. Examples:
|
||||||
|
|
||||||
|
- `nightly-1-gpu` (CUDA)
|
||||||
|
- `nightly-kernel-1-gpu` (CUDA, JIT kernel full grids)
|
||||||
|
- `nightly-kernel-8-gpu-h200` (CUDA, multi-GPU JIT kernel nightly)
|
||||||
|
- `nightly-8-gpu-h200` (CUDA)
|
||||||
|
- `nightly-eval-vlm-2-gpu` (CUDA)
|
||||||
|
- `nightly-amd` (AMD)
|
||||||
|
- `nightly-amd-8-gpu-mi35x` (AMD)
|
||||||
|
- `nightly-1-npu-a3` (NPU)
|
||||||
|
- `nightly-2-npu-a3` (NPU)
|
||||||
|
- `nightly-4-npu-a3` (NPU)
|
||||||
|
- `nightly-8-npu-a3` (NPU)
|
||||||
|
- `nightly-16-npu-a3` (NPU)
|
||||||
|
|
||||||
|
> **Note**: Multimodal diffusion uses `python/sglang/multimodal_gen/test/run_suite.py`, not `test/run_suite.py`.
|
||||||
|
|
||||||
|
### Choosing a Suite
|
||||||
|
|
||||||
|
Use the lightest suite that meets your test's needs:
|
||||||
|
|
||||||
|
- **No GPU required** → `stage-a-test-cpu`
|
||||||
|
- **Most small GPU tests** → `stage-b-test-1-gpu-small` (default choice)
|
||||||
|
- **Need H100 memory or Hopper features** → `stage-b-test-1-gpu-large`
|
||||||
|
- **JIT kernel correctness** → `stage-b-kernel-unit-1-gpu-large`
|
||||||
|
- **JIT kernel benchmarks** → `stage-b-kernel-benchmark-1-gpu-large`
|
||||||
|
- **Multi-GPU** → only when the test actually needs multiple GPUs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test File Templates
|
||||||
|
|
||||||
|
### Unit Tests (no server / engine launch)
|
||||||
|
|
||||||
|
See `test/registered/unit/README.md` for quick-start and rules. Unit tests live in `test/registered/unit/`, mirroring `python/sglang/srt/`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
"""Unit tests for srt/<module>"""
|
||||||
|
|
||||||
|
import unittest
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
|
from sglang.srt.<module> import TargetClass
|
||||||
|
from sglang.test.ci.ci_register import register_cpu_ci
|
||||||
|
from sglang.test.test_utils import CustomTestCase
|
||||||
|
|
||||||
|
register_cpu_ci(est_time=5, suite="stage-a-test-cpu")
|
||||||
|
# Prefer CPU. Only use register_cuda_ci when the test truly needs a GPU.
|
||||||
|
|
||||||
|
class TestTargetClass(CustomTestCase):
|
||||||
|
def test_basic_behavior(self):
|
||||||
|
obj = TargetClass(...)
|
||||||
|
self.assertEqual(obj.method(), expected)
|
||||||
|
|
||||||
|
@patch("sglang.srt.<module>.some_dependency")
|
||||||
|
def test_with_mock(self, mock_dep):
|
||||||
|
mock_dep.return_value = MagicMock()
|
||||||
|
# test logic with dependency mocked
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `unittest.mock.patch` / `MagicMock` to mock dependencies and isolate the logic under test. If the module transitively imports GPU-only packages (e.g. `sgl_kernel`), they can be stubbed so the test runs on CPU CI. See `test/registered/unit/README.md` for details and examples.
|
||||||
|
|
||||||
|
**Quality bar** — test real logic (validation boundaries, state transitions, error paths, branching, etc.). Skip tests that just verify Python itself works (e.g., "does calling an abstract method raise `NotImplementedError`?", "does a dataclass store the field I assigned?"). Consolidate repetitive patterns into parameterized tests. No production code changes in test PRs.
|
||||||
|
|
||||||
|
### E2E test (small model, server needed)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
from sglang.srt.utils import kill_process_tree
|
||||||
|
from sglang.test.ci.ci_register import register_cuda_ci
|
||||||
|
from sglang.test.test_utils import (
|
||||||
|
DEFAULT_SMALL_MODEL_NAME_FOR_TEST,
|
||||||
|
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
|
||||||
|
DEFAULT_URL_FOR_TEST,
|
||||||
|
CustomTestCase,
|
||||||
|
popen_launch_server,
|
||||||
|
)
|
||||||
|
|
||||||
|
register_cuda_ci(est_time=60, suite="stage-b-test-1-gpu-small")
|
||||||
|
|
||||||
|
|
||||||
|
class TestMyFeature(CustomTestCase):
|
||||||
|
@classmethod
|
||||||
|
def setUpClass(cls):
|
||||||
|
cls.model = DEFAULT_SMALL_MODEL_NAME_FOR_TEST
|
||||||
|
cls.base_url = DEFAULT_URL_FOR_TEST
|
||||||
|
cls.process = popen_launch_server(
|
||||||
|
cls.model,
|
||||||
|
cls.base_url,
|
||||||
|
timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
|
||||||
|
other_args=["--arg1", "value1"], # feature-specific args
|
||||||
|
)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def tearDownClass(cls):
|
||||||
|
if hasattr(cls, "process") and cls.process:
|
||||||
|
kill_process_tree(cls.process.pid)
|
||||||
|
|
||||||
|
def test_basic_functionality(self):
|
||||||
|
response = requests.post(
|
||||||
|
self.base_url + "/generate",
|
||||||
|
json={"text": "Hello", "sampling_params": {"max_new_tokens": 32}},
|
||||||
|
)
|
||||||
|
self.assertEqual(response.status_code, 200)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main(verbosity=3)
|
||||||
|
```
|
||||||
|
|
||||||
|
### E2E test (8B model, server needed, performance)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import time
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
from sglang.srt.utils import kill_process_tree
|
||||||
|
from sglang.test.ci.ci_register import register_cuda_ci
|
||||||
|
from sglang.test.test_utils import (
|
||||||
|
DEFAULT_MODEL_NAME_FOR_TEST,
|
||||||
|
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
|
||||||
|
DEFAULT_URL_FOR_TEST,
|
||||||
|
CustomTestCase,
|
||||||
|
popen_launch_server,
|
||||||
|
)
|
||||||
|
|
||||||
|
register_cuda_ci(est_time=300, suite="stage-b-test-1-gpu-large")
|
||||||
|
|
||||||
|
|
||||||
|
class TestMyFeaturePerf(CustomTestCase):
|
||||||
|
@classmethod
|
||||||
|
def setUpClass(cls):
|
||||||
|
cls.model = DEFAULT_MODEL_NAME_FOR_TEST
|
||||||
|
cls.base_url = DEFAULT_URL_FOR_TEST
|
||||||
|
cls.process = popen_launch_server(
|
||||||
|
cls.model,
|
||||||
|
cls.base_url,
|
||||||
|
timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
|
||||||
|
)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def tearDownClass(cls):
|
||||||
|
if hasattr(cls, "process") and cls.process:
|
||||||
|
kill_process_tree(cls.process.pid)
|
||||||
|
|
||||||
|
def test_latency(self):
|
||||||
|
start = time.perf_counter()
|
||||||
|
response = requests.post(
|
||||||
|
self.base_url + "/generate",
|
||||||
|
json={"text": "Hello", "sampling_params": {"max_new_tokens": 128}},
|
||||||
|
)
|
||||||
|
elapsed = time.perf_counter() - start
|
||||||
|
self.assertEqual(response.status_code, 200)
|
||||||
|
self.assertLess(elapsed, 5.0, "Latency exceeded threshold")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main(verbosity=3)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Server Fixture Reuse
|
||||||
|
|
||||||
|
For tests that only need a standard server, inherit from `DefaultServerBase` and override class attributes:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sglang.test.server_fixtures.default_fixture import DefaultServerBase
|
||||||
|
|
||||||
|
class TestMyFeature(DefaultServerBase):
|
||||||
|
model = DEFAULT_SMALL_MODEL_NAME_FOR_TEST
|
||||||
|
other_args = ["--enable-my-feature"]
|
||||||
|
|
||||||
|
def test_something(self):
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Available fixtures in `python/sglang/test/server_fixtures/`:
|
||||||
|
|
||||||
|
| Fixture | Use case |
|
||||||
|
|---------|----------|
|
||||||
|
| `DefaultServerBase` | Standard single-server tests |
|
||||||
|
| `EagleServerBase` | EAGLE speculative decoding |
|
||||||
|
| `PDDisaggregationServerBase` | Disaggregated prefill/decode |
|
||||||
|
| `MMMUServerBase` | Multimodal VLM tests |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CI Registration
|
||||||
|
|
||||||
|
Every CI-discovered test file must call a registration function at module level:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sglang.test.ci.ci_register import (
|
||||||
|
register_cuda_ci,
|
||||||
|
register_amd_ci,
|
||||||
|
register_cpu_ci,
|
||||||
|
register_npu_ci,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Per-commit test (small 1-gpu, runs on 5090)
|
||||||
|
register_cuda_ci(est_time=80, suite="stage-b-test-1-gpu-small")
|
||||||
|
|
||||||
|
# Per-commit test (large 1-gpu, runs on H100)
|
||||||
|
register_cuda_ci(est_time=120, suite="stage-b-test-1-gpu-large")
|
||||||
|
|
||||||
|
# Nightly-only test
|
||||||
|
register_cuda_ci(est_time=200, suite="nightly-1-gpu", nightly=True)
|
||||||
|
|
||||||
|
# Multi-backend test (only when testing backend-specific code paths)
|
||||||
|
register_cuda_ci(est_time=80, suite="stage-a-test-1-gpu-small")
|
||||||
|
register_amd_ci(est_time=120, suite="stage-a-test-1-gpu-small-amd")
|
||||||
|
register_npu_ci(est_time=400, suite="nightly-8-npu-a3", nightly=True)
|
||||||
|
|
||||||
|
# Temporarily disabled test
|
||||||
|
register_cuda_ci(est_time=80, suite="stage-b-test-1-gpu-small", disabled="flaky - see #12345")
|
||||||
|
```
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- `est_time`: estimated runtime in seconds (used for CI partitioning)
|
||||||
|
- `suite`: which CI suite to run in (see suite tables above)
|
||||||
|
- `nightly=True`: for nightly-only tests (default `False` = per-commit)
|
||||||
|
- `disabled="reason"`: temporarily disable with explanation
|
||||||
|
|
||||||
|
**Key principle**: Only add `register_amd_ci` / `register_npu_ci` when the test exercises backend-specific code paths. Common E2E tests just need `register_cuda_ci` — duplicating across backends wastes CI time.
|
||||||
|
|
||||||
|
### JIT Kernel Registration
|
||||||
|
|
||||||
|
JIT kernel files live outside `test/registered/` but still use registration:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sglang.test.ci.ci_register import register_cuda_ci
|
||||||
|
|
||||||
|
# Correctness tests in python/sglang/jit_kernel/tests/
|
||||||
|
register_cuda_ci(est_time=30, suite="stage-b-kernel-unit-1-gpu-large")
|
||||||
|
register_cuda_ci(est_time=120, suite="stage-b-kernel-unit-8-gpu-h200")
|
||||||
|
|
||||||
|
# Benchmarks in python/sglang/jit_kernel/benchmark/
|
||||||
|
register_cuda_ci(est_time=6, suite="stage-b-kernel-benchmark-1-gpu-large")
|
||||||
|
|
||||||
|
# Optional nightly registration
|
||||||
|
register_cuda_ci(est_time=120, suite="nightly-kernel-1-gpu", nightly=True)
|
||||||
|
register_cuda_ci(est_time=120, suite="nightly-kernel-8-gpu-h200", nightly=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
Keep `est_time` and `suite` as **literal values** — `run_suite.py` collects them by AST parsing
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Placement
|
||||||
|
|
||||||
|
```
|
||||||
|
test/
|
||||||
|
├── registered/ # CI tests (auto-discovered by run_suite.py)
|
||||||
|
│ ├── unit/ # No server / engine launch (see test/registered/unit/README.md)
|
||||||
|
│ ├── kernels/ # CUDA kernel correctness (no server, GPU required)
|
||||||
|
│ ├── sampling/ # test_penalty.py, test_sampling_params.py ...
|
||||||
|
│ ├── sessions/ # test_session_control.py ...
|
||||||
|
│ ├── openai_server/ # basic/, features/, validation/ ...
|
||||||
|
│ ├── spec/ # eagle/, utils/ ...
|
||||||
|
│ ├── models/ # model-specific accuracy tests
|
||||||
|
│ ├── perf/ # performance benchmarks
|
||||||
|
│ └── <category>/ # create new category if needed
|
||||||
|
├── manual/ # Non-CI: debugging, one-off, manual verification
|
||||||
|
└── run_suite.py # CI runner (scans registered/ plus jit_kernel test/benchmark files)
|
||||||
|
|
||||||
|
python/sglang/jit_kernel/
|
||||||
|
├── tests/ # JIT kernel correctness tests (CI-discovered by test/run_suite.py)
|
||||||
|
└── benchmark/ # JIT kernel benchmarks (CI-discovered by test/run_suite.py)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Decision rule** (see also `test/registered/README.md`):
|
||||||
|
- Component logic, no server → `registered/unit/`
|
||||||
|
- JIT kernel correctness / benchmarks → `python/sglang/jit_kernel/tests/` or `python/sglang/jit_kernel/benchmark/`
|
||||||
|
- Other kernel correctness → `registered/kernels/`
|
||||||
|
- Server needed → `registered/<category>/`
|
||||||
|
- Local debugging → `manual/`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Eval Accuracy Mixins
|
||||||
|
|
||||||
|
**Design philosophy**: Most test files don't care about eval logic — they only need a "does this feature break model output quality?" sanity check. The mixin pattern separates **what to test** (threshold) from **how to test** (run_eval, assertions, CI summary). Test classes declare thresholds as class attributes; the mixin provides the `test_*` method. Override when you need extra assertions (e.g. EAGLE accept length).
|
||||||
|
|
||||||
|
Available mixins in `python/sglang/test/kits/eval_accuracy_kit.py`: `MMLUMixin`, `HumanEvalMixin`, `MGSMEnMixin`, `GSM8KMixin`. Can be combined freely. Read the source for attrs and defaults.
|
||||||
|
|
||||||
|
```python
|
||||||
|
class TestMyFeature(CustomTestCase, MMLUMixin):
|
||||||
|
mmlu_score_threshold = 0.65
|
||||||
|
mmlu_num_examples = 64
|
||||||
|
mmlu_num_threads = 32
|
||||||
|
# test_mmlu is inherited — no code needed
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Utilities
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sglang.test.test_utils import (
|
||||||
|
CustomTestCase, # base class with retry logic
|
||||||
|
popen_launch_server, # launch server subprocess
|
||||||
|
DEFAULT_URL_FOR_TEST, # auto-configured base URL
|
||||||
|
DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, # 600s default
|
||||||
|
run_bench_serving, # benchmark helper (launch + bench)
|
||||||
|
)
|
||||||
|
from sglang.srt.utils import kill_process_tree # cleanup server
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checklist
|
||||||
|
|
||||||
|
Before submitting a test:
|
||||||
|
|
||||||
|
- [ ] Inherits from `CustomTestCase` (not `unittest.TestCase`)
|
||||||
|
- [ ] Has `register_*_ci(...)` call at module level
|
||||||
|
- [ ] Placed in `test/registered/<category>/`, unless this is a JIT kernel test/benchmark
|
||||||
|
- [ ] JIT kernel work: files live in `python/sglang/jit_kernel/tests/` or `python/sglang/jit_kernel/benchmark/`
|
||||||
|
- [ ] Backend-independent tests: `register_cuda_ci` only + smallest model
|
||||||
|
- [ ] Logic that doesn't need a server / engine launch → unit test in `registered/unit/` (see Unit Tests section)
|
||||||
|
- [ ] `setUpClass` launches server, `tearDownClass` kills it (if server-based)
|
||||||
|
- [ ] `tearDownClass` is defensive — uses `hasattr`/null checks before accessing resources that may not have been allocated
|
||||||
|
- [ ] Has `if __name__ == "__main__": unittest.main()`
|
||||||
|
- [ ] `est_time` is reasonable (measure locally)
|
||||||
3
third_party/sglang/.codespellrc
vendored
Normal file
3
third_party/sglang/.codespellrc
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
[codespell]
|
||||||
|
ignore-words-list = ans, als, hel, boostrap, childs, te, vas, hsa, ment, cann, thi, makro, wil, rouge, PRIS
|
||||||
|
skip = *.json,*.jsonl,*.patch,*.txt
|
||||||
16
third_party/sglang/.coveragerc
vendored
Normal file
16
third_party/sglang/.coveragerc
vendored
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
[run]
|
||||||
|
source = python/sglang/srt
|
||||||
|
omit =
|
||||||
|
*/test/*
|
||||||
|
*/__pycache__/*
|
||||||
|
|
||||||
|
[report]
|
||||||
|
show_missing = true
|
||||||
|
exclude_lines =
|
||||||
|
pragma: no cover
|
||||||
|
if __name__ == .__main__.:
|
||||||
|
raise NotImplementedError
|
||||||
|
if TYPE_CHECKING
|
||||||
|
|
||||||
|
[html]
|
||||||
|
directory = htmlcov
|
||||||
35
third_party/sglang/.devcontainer/Dockerfile
vendored
Normal file
35
third_party/sglang/.devcontainer/Dockerfile
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
FROM lmsysorg/sglang:dev
|
||||||
|
|
||||||
|
# Create non-root user with specified UID and GID
|
||||||
|
# NOTE: Replace with your own UID and GID. This is a workaround from https://github.com/microsoft/vscode-remote-release/issues/49#issuecomment-489060908.
|
||||||
|
ARG HOST_UID=1003
|
||||||
|
ARG HOST_GID=1003
|
||||||
|
RUN groupadd -g $HOST_GID devuser && \
|
||||||
|
useradd -m -u $HOST_UID -g $HOST_GID -s /bin/zsh devuser
|
||||||
|
|
||||||
|
# Give devuser sudo access
|
||||||
|
RUN apt-get update && apt-get install -y sudo && \
|
||||||
|
echo "devuser ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/devuser && \
|
||||||
|
rm -rf /var/lib/apt/lists/* && \
|
||||||
|
apt-get clean
|
||||||
|
|
||||||
|
# Set up oh-my-zsh for devuser
|
||||||
|
RUN cp -r /root/.oh-my-zsh /home/devuser/.oh-my-zsh && \
|
||||||
|
cp /root/.zshrc /home/devuser/.zshrc && \
|
||||||
|
cp /root/.vimrc /home/devuser/.vimrc && \
|
||||||
|
cp /root/.tmux.conf /home/devuser/.tmux.conf && \
|
||||||
|
sed -i 's|/root/.oh-my-zsh|/home/devuser/.oh-my-zsh|g' /home/devuser/.zshrc && \
|
||||||
|
chown -R devuser:devuser /home/devuser/
|
||||||
|
|
||||||
|
# Set workspace directory and ownership
|
||||||
|
WORKDIR /sgl-workspace/sglang
|
||||||
|
RUN chown -R devuser:devuser /sgl-workspace
|
||||||
|
|
||||||
|
# Switch to devuser
|
||||||
|
USER devuser
|
||||||
|
|
||||||
|
# Install uv
|
||||||
|
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
|
||||||
|
# Install rust
|
||||||
|
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
|
||||||
30
third_party/sglang/.devcontainer/devcontainer.json
vendored
Normal file
30
third_party/sglang/.devcontainer/devcontainer.json
vendored
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
{
|
||||||
|
"name": "sglang",
|
||||||
|
"build": {
|
||||||
|
"dockerfile": "Dockerfile"
|
||||||
|
},
|
||||||
|
"remoteUser": "devuser",
|
||||||
|
"customizations": {
|
||||||
|
"vscode": {
|
||||||
|
"extensions": [
|
||||||
|
// Python development
|
||||||
|
"ms-python.python",
|
||||||
|
"charliermarsh.ruff",
|
||||||
|
// Rust development
|
||||||
|
"rust-lang.rust-analyzer",
|
||||||
|
"tamasfe.even-better-toml"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"forwardPorts": [],
|
||||||
|
"runArgs": [
|
||||||
|
"--gpus",
|
||||||
|
"all"
|
||||||
|
],
|
||||||
|
// The two lines below ensures that your local changes in the sglang
|
||||||
|
// repo is automatically synced to the sglang pip package installed
|
||||||
|
// in the dev docker container. You can remove / comment out these
|
||||||
|
// two lines if you prefer to sync code changes manually.
|
||||||
|
"workspaceMount": "source=${localWorkspaceFolder},target=/sgl-workspace/sglang,type=bind",
|
||||||
|
"workspaceFolder": "/sgl-workspace/sglang"
|
||||||
|
}
|
||||||
1
third_party/sglang/.dockerignore
vendored
Symbolic link
1
third_party/sglang/.dockerignore
vendored
Symbolic link
@@ -0,0 +1 @@
|
|||||||
|
.gitignore
|
||||||
1283
third_party/sglang/.github/CI_PERMISSIONS.json
vendored
Normal file
1283
third_party/sglang/.github/CI_PERMISSIONS.json
vendored
Normal file
File diff suppressed because it is too large
Load Diff
74
third_party/sglang/.github/CODEOWNERS
vendored
Normal file
74
third_party/sglang/.github/CODEOWNERS
vendored
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
.github @merrymercy @Fridge003 @ispobock @Kangyan-Zhou @bingxche
|
||||||
|
/docker @Fridge003 @ispobock @HaiShaw @ishandhanani @yctseng0211
|
||||||
|
/docker/npu.Dockerfile @ping1jing2 @iforgetmyname
|
||||||
|
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
|
||||||
|
/python/sglang/jit_kernel @DarkSharpness @BBuf @celve @HydraQYH @yuan-luo
|
||||||
|
/python/sglang/jit_kernel/diffusion @yingluosanqian @BBuf @mickqian
|
||||||
|
/python/sglang/multimodal_gen @mickqian @yhyang201 @ping1jing2
|
||||||
|
/python/sglang/multimodal_gen/runtime/cache @DefTruth
|
||||||
|
/python/sglang/multimodal_gen/runtime/layers @mickqian @yhyang201 @BBuf @yingluosanqian @ping1jing2
|
||||||
|
/python/sglang/multimodal_gen/runtime/models/dits @mickqian @yhyang201 @BBuf @yingluosanqian @ping1jing2
|
||||||
|
/python/sglang/srt/batch_invariant_ops @Fridge003 @hebiao064
|
||||||
|
/python/sglang/srt/compilation @hebiao064 @Oasis-Git
|
||||||
|
/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
|
||||||
|
/python/sglang/srt/disaggregation @ByronHsu @hnyls2002 @ShangmingCai
|
||||||
|
/python/sglang/srt/disaggregation/ascend @ping1jing2 @iforgetmyname
|
||||||
|
/python/sglang/srt/distributed @yizhang2077 @merrymercy @ch-wan
|
||||||
|
/python/sglang/srt/distributed/device_communicators/mooncake_transfer_engine.py @ShangmingCai @stmatengss
|
||||||
|
/python/sglang/srt/dllm @ClawSeven @btw616
|
||||||
|
/python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy @JustinTong0323
|
||||||
|
/python/sglang/srt/entrypoints/engine_score_mixin.py @sundar24295s @chanh @fortunecookiee
|
||||||
|
/python/sglang/srt/entrypoints/grpc_server.py @CatherineSue @slin1237
|
||||||
|
/python/sglang/srt/entrypoints/openai/serving_score.py @sundar24295s @chanh @fortunecookiee
|
||||||
|
/python/sglang/srt/eplb @fzyzcjy @ch-wan
|
||||||
|
/python/sglang/srt/function_call @CatherineSue @JustinTong0323
|
||||||
|
/python/sglang/srt/grpc @CatherineSue @slin1237
|
||||||
|
/python/sglang/srt/hardware_backend/npu @ping1jing2 @iforgetmyname
|
||||||
|
/python/sglang/srt/hardware_backend/npu/quantization @OrangeRedeng @TamirBaydasov @iforgetmyname
|
||||||
|
/python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @Edwardf0t1
|
||||||
|
/python/sglang/srt/layers/attention @merrymercy @Fridge003 @ispobock @Qiaolin-Yu @hebiao064 @HaiShaw
|
||||||
|
/python/sglang/srt/layers/attention/fla @yizhang2077 @hebiao064 @yuan-luo
|
||||||
|
/python/sglang/srt/layers/attention/hybrid_linear_attn_backend.py @yizhang2077 @hebiao064 @hanming-lu @yuan-luo
|
||||||
|
/python/sglang/srt/layers/attention/mamba @yizhang2077 @hebiao064
|
||||||
|
/python/sglang/srt/layers/attention/nsa @1am9trash @hubertlu-tw @kkHuang-amd @HaiShaw @Fridge003 @hlu1 @rainj-me
|
||||||
|
/python/sglang/srt/layers/attention/vision.py @mickqian @yuan-luo @yhyang201
|
||||||
|
/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg @AniZpZ @HaiShaw @b8zhong
|
||||||
|
/python/sglang/srt/layers/quantization/quark @kkHuang-amd @yichiche @hubertlu-tw @1am9trash @BowenBao
|
||||||
|
/python/sglang/srt/lora @Ying1123 @Fridge003 @lifuhuang @yushengsu-thu
|
||||||
|
/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
|
||||||
|
/python/sglang/srt/managers/scheduler_pp_mixin.py @ShangmingCai @XucSh
|
||||||
|
/python/sglang/srt/managers/tokenizer_manager_score_mixin.py @sundar24295s @chanh @fortunecookiee
|
||||||
|
/python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @hanming-lu @yizhang2077 @hzh0425 @ispobock
|
||||||
|
/python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @Fridge003 @ispobock
|
||||||
|
/python/sglang/srt/model_executor/piecewise_cuda_graph_runner.py @hebiao064
|
||||||
|
/python/sglang/srt/models/deepseek_common @Fridge003 @ispobock @fzyzcjy @ch-wan
|
||||||
|
/python/sglang/srt/models/deepseek_v2.py @fzyzcjy @zhyncs @ispobock @ch-wan @merrymercy @Fridge003
|
||||||
|
/python/sglang/srt/models/transformers.py @adarshxs
|
||||||
|
/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201 @yuan-luo
|
||||||
|
/python/sglang/srt/observability @merrymercy @fzyzcjy @sufeng-buaa
|
||||||
|
/python/sglang/srt/ray @Qiaolin-Yu @xyuzh
|
||||||
|
/python/sglang/srt/speculative @Ying1123 @merrymercy @hnyls2002
|
||||||
|
/sgl-kernel @ispobock @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
|
||||||
|
/sgl-model-gateway @slin1237 @CatherineSue
|
||||||
|
/sgl-model-gateway/benches @slin1237
|
||||||
|
/sgl-model-gateway/bindings/python @CatherineSue @key4ng @slin1237
|
||||||
|
/sgl-model-gateway/e2e_test @CatherineSue @key4ng
|
||||||
|
/sgl-model-gateway/examples/wasm @slin1237
|
||||||
|
/sgl-model-gateway/src/config @slin1237
|
||||||
|
/sgl-model-gateway/src/core @slin1237
|
||||||
|
/sgl-model-gateway/src/data_connector @key4ng
|
||||||
|
/sgl-model-gateway/src/grpc_client @CatherineSue @slin1237
|
||||||
|
/sgl-model-gateway/src/mcp @key4ng @slin1237
|
||||||
|
/sgl-model-gateway/src/policies @slin1237 @ByronHsu
|
||||||
|
/sgl-model-gateway/src/proto @CatherineSue @slin1237
|
||||||
|
/sgl-model-gateway/src/protocols @CatherineSue @key4ng
|
||||||
|
/sgl-model-gateway/src/reasoning_parser @CatherineSue
|
||||||
|
/sgl-model-gateway/src/routers @CatherineSue @key4ng @slin1237
|
||||||
|
/sgl-model-gateway/src/tokenizer @slin1237 @CatherineSue
|
||||||
|
/sgl-model-gateway/src/tool_parser @slin1237 @CatherineSue
|
||||||
|
/sgl-model-gateway/src/wasm @slin1237
|
||||||
|
/sgl-model-gateway/examples/wasm @slin1237
|
||||||
|
/test/registered/core/test_score_api.py @sundar24295s @chanh @fortunecookiee
|
||||||
|
/benchmark/prefill_only/bench_score.py @sundar24295s @chanh @fortunecookiee
|
||||||
|
/test/srt/ascend @ping1jing2 @iforgetmyname
|
||||||
|
/test/srt/test_modelopt* @Edwardf0t1
|
||||||
12
third_party/sglang/.github/FOLDER_README.md
vendored
Normal file
12
third_party/sglang/.github/FOLDER_README.md
vendored
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# Maintenance Tools
|
||||||
|
|
||||||
|
This folder contains tools and workflows for automating maintenance tasks.
|
||||||
|
|
||||||
|
## CI Permissions
|
||||||
|
|
||||||
|
`CI_PERMISSIONS.json` defines the CI permissions granted to each user.
|
||||||
|
Maintainers can directly edit the file to add entries with `"reason": "custom override"`.
|
||||||
|
Maintainers can also run `update_ci_permission.py` to update it with some auto rules (e.g., top contributors in the last 90 days get full permissions).
|
||||||
|
|
||||||
|
## Others
|
||||||
|
- `MAINTAINER.md` defines the code maintenance model.
|
||||||
35
third_party/sglang/.github/ISSUE_TEMPLATE/1-bug-report.yml
vendored
Normal file
35
third_party/sglang/.github/ISSUE_TEMPLATE/1-bug-report.yml
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
name: 🐞 Bug report
|
||||||
|
description: Report a bug to help us reproduce and fix it.
|
||||||
|
title: "[Bug] "
|
||||||
|
labels: ['Bug']
|
||||||
|
|
||||||
|
body:
|
||||||
|
- type: checkboxes
|
||||||
|
attributes:
|
||||||
|
label: Checklist
|
||||||
|
options:
|
||||||
|
- label: I searched related issues but found no solution.
|
||||||
|
- label: The bug persists in the latest version.
|
||||||
|
- label: Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
|
||||||
|
- label: If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
|
||||||
|
- label: Please use English. Otherwise, it will be closed.
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Describe the bug
|
||||||
|
description: A clear, concise description of the bug.
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Reproduction
|
||||||
|
description: Command/script run and model used.
|
||||||
|
placeholder: Paste the command here.
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Environment
|
||||||
|
description: Run `python3 -m sglang.check_env` and paste output here. Issues without this will be closed.
|
||||||
|
placeholder: Paste environment output here.
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
23
third_party/sglang/.github/ISSUE_TEMPLATE/2-feature-request.yml
vendored
Normal file
23
third_party/sglang/.github/ISSUE_TEMPLATE/2-feature-request.yml
vendored
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
name: 🚀 Feature request
|
||||||
|
description: Suggest an idea for this project
|
||||||
|
title: "[Feature] "
|
||||||
|
|
||||||
|
body:
|
||||||
|
- type: checkboxes
|
||||||
|
attributes:
|
||||||
|
label: Checklist
|
||||||
|
options:
|
||||||
|
- label: If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
|
||||||
|
- label: Please use English. Otherwise, it will be closed.
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Motivation
|
||||||
|
description: |
|
||||||
|
Clearly and concisely describe the feature's motivation.
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Related resources
|
||||||
|
description: |
|
||||||
|
Provide official releases or third-party implementations if available.
|
||||||
154
third_party/sglang/.github/MAINTAINER.md
vendored
Normal file
154
third_party/sglang/.github/MAINTAINER.md
vendored
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
# SGLang Code Maintenance Model
|
||||||
|
This document describes the code maintenance model for the SGLang project.
|
||||||
|
Since SGLang is a large project involving multiple organizations and hardware platforms, we designed this model with the following goals:
|
||||||
|
- Ensure a responsive and smooth review process.
|
||||||
|
- Allow for fast iteration, so maintainers can sometimes bypass flaky CI tests for important PRs.
|
||||||
|
|
||||||
|
## Role Descriptions
|
||||||
|
There are four roles in this maintenance model. Some are custom roles, while others are predefined by GitHub.
|
||||||
|
|
||||||
|
- **Merge Oncall**: The person who drives the PR merge process. They have strong area-specific expertise and uphold a high bar for code quality.
|
||||||
|
- Permission: Merge PRs. Bypass branch protection rules if needed.
|
||||||
|
- Responsibility: Shepherd the merge of PRs assigned to their area. Revert or hotfix any issues related to their merge (especially if they bypass).
|
||||||
|
- **Codeowner**: The person who protects critical code. Without a bypass, each PR needs at least one Codeowner approval for each modified file protected by [CODEOWNERS](./CODEOWNERS). Please note that this role is not an honor but a significant responsibility because PRs cannot be merged without your approval (except when bypassed by a Merge Oncall).
|
||||||
|
- Permission: Approve PRs, allowing them to be merged without a bypass.
|
||||||
|
- Responsibility: Review PRs in a timely manner.
|
||||||
|
- **Write**: A person with write permission to the SGLang repo.
|
||||||
|
- Permission: Merge PRs if they have passed required tests and been approved by Codeowners. This role cannot bypass branch protection rules.
|
||||||
|
- Responsibility: Review and merge PRs in a timely manner.
|
||||||
|
- **CI Oncall**: A person who manages CI runners for specific hardware platforms.
|
||||||
|
- Permission: Add CI runners.
|
||||||
|
- Responsibility: Keep the CI runners up and running.
|
||||||
|
|
||||||
|
__Note__: Difference between Merge Oncall and Codeowner
|
||||||
|
- The Merge Oncall is an active role held by someone who actively tries to help merge PRs and can bypass CI if needed.
|
||||||
|
- The Codeowner is a passive protection role provided by GitHub; it prevents accidental changes to critical code.
|
||||||
|
- The list of Merge Oncalls is attached below. The list of Codeowners is in the [CODEOWNERS](./CODEOWNERS) file.
|
||||||
|
|
||||||
|
__Note__: The permissions to trigger CI tests are defined separately according to these [rules](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests).
|
||||||
|
|
||||||
|
|
||||||
|
## Pull Request Merge Process
|
||||||
|
1. The author submits a pull request (PR) and fills out the PR checklist.
|
||||||
|
2. A bot assigns this PR to a Merge Oncall and @-mentions them. At the same time, GitHub will automatically request reviews from Codeowners.
|
||||||
|
3. Someone tags the PR with a `run-ci` label ([help](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests)). Then the author can trigger CI by pushing new commits.
|
||||||
|
4. The Merge Oncall coordinates the review (e.g., asking people to review) and approves the PR; the Codeowners also approve the PR. If the assigned Merge Oncall is not responsive, the author can ping other related Merge Oncalls and Reviewers in the list below.
|
||||||
|
5. The code can now be merged:
|
||||||
|
- **Ideal case:** For each modified file, one Codeowner has approved the PR. The PR has also passed the required CI tests. Then, anyone with write permission can merge the PR.
|
||||||
|
- **Exception:** In cases where it is difficult to meet all requirements (due to flaky CI or slow responses), a Merge Oncall can bypass branch protection to merge the PR.
|
||||||
|
|
||||||
|
If you meet any issues during the merge, you can discuss in [slack channels](https://slack.sglang.io/): #pull-request, #ci-cd-build-release, #dev.
|
||||||
|
|
||||||
|
## The List of Merge Oncalls and Reviewers
|
||||||
|
This section lists the oncalls for each module or feature.
|
||||||
|
The format is @github-username (Slack username).
|
||||||
|
|
||||||
|
### Scheduler
|
||||||
|
[@merrymercy](https://github.com/merrymercy) (Lianmin Zheng), [@hnyls2002](https://github.com/hnyls2002) (Liangsheng Yin), [@cctry](https://github.com/cctry) (Shiyang Chen)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/managers
|
||||||
|
- python/sglang/srt/model_executor
|
||||||
|
|
||||||
|
### Diffusion
|
||||||
|
[@mickqian](https://github.com/mickqian) (Mick), [@BBuf](https://github.com/BBuf) (BBuf)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/multimodal_gen
|
||||||
|
|
||||||
|
### PD disaggregation
|
||||||
|
[@ByronHsu](https://github.com/ByronHsu) (Byron Hsu), [@cctry](https://github.com/cctry) (Shiyang Chen), [@ShangmingCai](https://github.com/ShangmingCai) (Shangming Cai)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/disaggregation
|
||||||
|
|
||||||
|
### KV Cache
|
||||||
|
[@ispobock](https://github.com/ispobock) (Ke Bao), [@xiezhq-hermann](https://github.com/xiezhq-hermann) (Zhiqiang Xie)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/mem_cache
|
||||||
|
|
||||||
|
### Parallelism
|
||||||
|
[@ch-wan](https://github.com/ch-wan) (Cheng Wan), [@fzyzcjy](https://github.com/fzyzcjy) (Tom)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/eplb
|
||||||
|
- python/sglang/srt/distributed
|
||||||
|
- python/sglang/srt/layers/dp_attention.py
|
||||||
|
|
||||||
|
### Kernel
|
||||||
|
[@BBuf](https://github.com/BBuf) (BBuf)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/jit_kernel
|
||||||
|
- sgl-kernel
|
||||||
|
|
||||||
|
### Speculative decoding
|
||||||
|
[@hnyls2002](https://github.com/hnyls2002) (Liangsheng Yin), [@Qiaolin-Yu](https://github.com/Qiaolin-Yu) (Qiaolin Yu)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/speculative
|
||||||
|
|
||||||
|
### NV and model-specific optimizations
|
||||||
|
[@Fridge003](https://github.com/Fridge003) (Baizhou Zhang), [@ishandhanani](https://github.com/ishandhanani) (Ishan Dhanani), [@Qiaolin-Yu](https://github.com/Qiaolin-Yu) (Qiaolin Yu)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/models
|
||||||
|
- python/sglang/srt/layers/attention
|
||||||
|
|
||||||
|
### AMD optimizations
|
||||||
|
[@HaiShaw](https://github.com/HaiShaw) (Henry HAI)
|
||||||
|
|
||||||
|
### NPU optimizations
|
||||||
|
[@iforgetmyname](https://github.com/iforgetmyname) (Even Zhou)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- python/sglang/srt/hardware_backend/npu
|
||||||
|
|
||||||
|
### CI, Release, Package
|
||||||
|
[@Kangyan-Zhou](https://github.com/Kangyan-Zhou) (Kangyan Zhou), [@Fridge003](https://github.com/Fridge003) (Baizhou Zhang)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- .github/workflows
|
||||||
|
|
||||||
|
### Router, API
|
||||||
|
[@slin1237](https://github.com/slin1237) (Simo Lin)
|
||||||
|
|
||||||
|
related files
|
||||||
|
- sgl-model-gateway
|
||||||
|
- python/sglang/srt/grpc
|
||||||
|
- python/sglang/srt/entrypoints
|
||||||
|
|
||||||
|
### Other Notes
|
||||||
|
|
||||||
|
Now we have many Merge Oncalls mainly because the CI is flaky and the CODEOWNERS is too coarse-grained.
|
||||||
|
In the future, we hope the CI can be improved and we only need bypass rarely. After that, most Merge Oncalls can be converted back to Write and CODEOWNERS.
|
||||||
|
|
||||||
|
This list is based on the current situation. If you or someone you know would like to take on more responsibility and are qualified, please ping [Lianmin Zheng](https://github.com/merrymercy) and [Ying Sheng](https://github.com/Ying1123) in the Slack channel. They will start a nomination and internal review process.
|
||||||
|
|
||||||
|
## The List of CI Oncalls
|
||||||
|
This section lists the oncalls for each hardware platform. The format is @github-username (Slack username).
|
||||||
|
|
||||||
|
### NVIDIA GPUs
|
||||||
|
[@Kangyan-Zhou](https://github.com/Kangyan-Zhou) (Kangyan Zhou), [@ch-wan](https://github.com/ch-wan) (Cheng Wan), [@HanHan009527](https://github.com/HanHan009527) (hanhan), [@ishandhanani](https://github.com/ishandhanani) (Ishan Dhanani), [@ShangmingCai](https://github.com/ShangmingCai) (Shangming Cai), [@alisonshao](https://github.com/alisonshao) (Alison Shao).
|
||||||
|
|
||||||
|
### AMD GPUs
|
||||||
|
[@saienduri](https://github.com/saienduri) (Sai Enduri), [@HaiShaw](https://github.com/HaiShaw) (Henry HAI)
|
||||||
|
|
||||||
|
### Intel CPU and XPU
|
||||||
|
[@mingfeima](https://github.com/mingfeima) (Mingfei Ma), [@DiweiSun](https://github.com/DiweiSun) (Diwei Sun)
|
||||||
|
|
||||||
|
### Ascend NPUs
|
||||||
|
[@iforgetmyname](https://github.com/iforgetmyname) (Even Zhou)
|
||||||
|
|
||||||
|
This list is based on the current situation. If you or someone you know would like to donate machines for CI, they can serve as the CI oncalls for their machines. Please ping [Lianmin Zheng](https://github.com/merrymercy) and [Ying Sheng](https://github.com/Ying1123) in the Slack channel. They will start a nomination and internal review process.
|
||||||
|
|
||||||
|
## CI Maintenance Mode
|
||||||
|
When the CI is unhealthy (e.g., the scheduled pr-test on `main` is broken for consecutive runs), the project enters **CI Maintenance Mode** by opening [issue #21065](https://github.com/sgl-project/sglang/issues/21065). While active:
|
||||||
|
- All PR CI runs are paused. Resources are allocated to PRs that fix the CI.
|
||||||
|
- **Merging non-CI-fix PRs is prohibited.** Only PRs that fix the CI may be merged. In severe cases, merge permissions may be revoked.
|
||||||
|
|
||||||
|
Maintenance mode ends when `pr-test.yml` is all green on `main` and the issue is closed.
|
||||||
|
|
||||||
|
## Suspending Permissions
|
||||||
|
If a Merge Oncall bypasses checks to merge a PR that breaks the `main` branch, merges a non-CI-fix PR during CI Maintenance Mode, or repeatedly breaks the CI due to various reasons, their privileges will be suspended for at least two days, depending on the severity of the incident.
|
||||||
63
third_party/sglang/.github/actions/check-maintenance/action.yml
vendored
Normal file
63
third_party/sglang/.github/actions/check-maintenance/action.yml
vendored
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
name: Check Maintenance Mode
|
||||||
|
description: Blocks CI when maintenance mode is active (issue #21065 is open), unless the PR has the bypass-maintenance label, or env SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN=true (PR Test workflow on main only). Merging non-CI-fix PRs is prohibited during maintenance mode; in severe cases, merge permissions may be revoked.
|
||||||
|
|
||||||
|
inputs:
|
||||||
|
github-token:
|
||||||
|
description: GitHub token for API access
|
||||||
|
required: false
|
||||||
|
default: ${{ github.token }}
|
||||||
|
|
||||||
|
runs:
|
||||||
|
using: composite
|
||||||
|
steps:
|
||||||
|
- name: Check maintenance mode
|
||||||
|
shell: bash
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ inputs.github-token }}
|
||||||
|
run: |
|
||||||
|
MAINTENANCE_ISSUE=21065
|
||||||
|
REPO="${{ github.repository }}"
|
||||||
|
PR_NUMBER="${{ github.event.pull_request.number }}"
|
||||||
|
|
||||||
|
# PR Test workflow only: scheduled runs and runs on main (dispatch / workflow_call) set this env
|
||||||
|
if [[ "${SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN:-}" == "true" ]]; then
|
||||||
|
echo "✅ PR Test on main branch; bypassing maintenance gate."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if maintenance issue is open (fail-open: if API errors, allow CI to proceed)
|
||||||
|
ISSUE_STATE=$(gh issue view "$MAINTENANCE_ISSUE" --repo "$REPO" --json state --jq '.state' 2>/dev/null || echo "UNKNOWN")
|
||||||
|
|
||||||
|
if [[ "$ISSUE_STATE" != "OPEN" ]]; then
|
||||||
|
echo "✅ Maintenance mode is OFF. Proceeding with CI."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# For PRs, check if bypass-maintenance label is present
|
||||||
|
if [[ -n "$PR_NUMBER" ]]; then
|
||||||
|
HAS_BYPASS=$(gh pr view "$PR_NUMBER" --repo "$REPO" --json labels --jq '[.labels[].name] | map(select(. == "bypass-maintenance")) | length' 2>/dev/null || echo "0")
|
||||||
|
if [[ "$HAS_BYPASS" -gt 0 ]]; then
|
||||||
|
echo "✅ PR #$PR_NUMBER has 'bypass-maintenance' label. Bypassing maintenance mode."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
MSG=$(printf "%s\n" \
|
||||||
|
"## ⚠️ CI Maintenance Mode is Active" \
|
||||||
|
"The CI infrastructure is currently under maintenance." \
|
||||||
|
"All PR CI runs are paused until maintenance is complete." \
|
||||||
|
"**Merging non-CI-fix PRs is prohibited during maintenance mode.** In severe cases, merge permissions may be revoked." \
|
||||||
|
"You might also experience unexpected failures during this period." \
|
||||||
|
"The team is working on the issue and will update the status as soon as possible." \
|
||||||
|
"" \
|
||||||
|
"What should you do?" \
|
||||||
|
"- **Do NOT merge non-CI-fix PRs** until maintenance mode is lifted" \
|
||||||
|
"- Check back later (~12 hours)" \
|
||||||
|
"- Follow CI Maintenance Mode issue: https://github.com/$REPO/issues/$MAINTENANCE_ISSUE for status updates")
|
||||||
|
|
||||||
|
echo "$MSG" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
while IFS= read -r line; do
|
||||||
|
echo "::error::$line"
|
||||||
|
done <<< "$MSG"
|
||||||
|
|
||||||
|
exit 1
|
||||||
50
third_party/sglang/.github/actions/check-stage-health/action.yml
vendored
Normal file
50
third_party/sglang/.github/actions/check-stage-health/action.yml
vendored
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
name: Check Stage Health
|
||||||
|
description: Fail fast if any job in the current workflow run has already failed. Auto-skips for scheduled runs.
|
||||||
|
|
||||||
|
inputs:
|
||||||
|
github-token:
|
||||||
|
description: 'GitHub token for API calls'
|
||||||
|
required: false
|
||||||
|
default: ${{ github.token }}
|
||||||
|
|
||||||
|
runs:
|
||||||
|
using: composite
|
||||||
|
steps:
|
||||||
|
- name: Check stage health
|
||||||
|
uses: actions/github-script@v7
|
||||||
|
env:
|
||||||
|
SKIP_STAGE_HEALTH_CHECK: ${{ env.SKIP_STAGE_HEALTH_CHECK }}
|
||||||
|
with:
|
||||||
|
github-token: ${{ inputs.github-token }}
|
||||||
|
script: |
|
||||||
|
// Skip when explicitly requested via env var (e.g. release branch cut)
|
||||||
|
if (process.env.SKIP_STAGE_HEALTH_CHECK === 'true') {
|
||||||
|
core.info('Skipping health check (SKIP_STAGE_HEALTH_CHECK=true)');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Skip for scheduled runs — they should collect all failures, not fast-fail
|
||||||
|
if (context.eventName === 'schedule') {
|
||||||
|
core.info('Skipping health check for scheduled run');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const jobs = await github.paginate(github.rest.actions.listJobsForWorkflowRun, {
|
||||||
|
owner: context.repo.owner,
|
||||||
|
repo: context.repo.repo,
|
||||||
|
run_id: context.runId,
|
||||||
|
per_page: 100,
|
||||||
|
});
|
||||||
|
// Find jobs that failed from a real error, not from fast-fail cascade
|
||||||
|
const rootCauseFailures = jobs.filter(j => {
|
||||||
|
if (j.status !== 'completed' || j.conclusion !== 'failure') return false;
|
||||||
|
// If the failing step is the health check, it's a cascade — skip it
|
||||||
|
const failedStep = (j.steps || []).find(s => s.conclusion === 'failure');
|
||||||
|
if (failedStep && (failedStep.name.includes('check-stage-health') || failedStep.name.includes('Check stage health'))) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
if (rootCauseFailures.length > 0) {
|
||||||
|
core.setFailed(`Fast-fail: skipping — root cause job(s): ${rootCauseFailures.map(j => j.name).join(', ')}`);
|
||||||
|
}
|
||||||
27
third_party/sglang/.github/actions/upload-cuda-coredumps/action.yml
vendored
Normal file
27
third_party/sglang/.github/actions/upload-cuda-coredumps/action.yml
vendored
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
name: Upload CUDA Coredumps
|
||||||
|
description: Upload CUDA coredump files as artifacts and clean up the directory.
|
||||||
|
|
||||||
|
inputs:
|
||||||
|
artifact-suffix:
|
||||||
|
description: Suffix appended to the artifact name (e.g. matrix partition id)
|
||||||
|
required: false
|
||||||
|
default: ""
|
||||||
|
retention-days:
|
||||||
|
description: Number of days to retain the artifact
|
||||||
|
required: false
|
||||||
|
default: "7"
|
||||||
|
|
||||||
|
runs:
|
||||||
|
using: composite
|
||||||
|
steps:
|
||||||
|
- name: Upload CUDA coredumps
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: cuda-coredumps-${{ github.job }}${{ inputs.artifact-suffix && format('-{0}', inputs.artifact-suffix) }}
|
||||||
|
path: ${{ env.SGLANG_CUDA_COREDUMP_DIR || '/tmp/sglang_cuda_coredumps' }}/
|
||||||
|
retention-days: ${{ inputs.retention-days }}
|
||||||
|
if-no-files-found: ignore
|
||||||
|
|
||||||
|
- name: Cleanup CUDA coredumps
|
||||||
|
shell: bash
|
||||||
|
run: rm -rf "${{ env.SGLANG_CUDA_COREDUMP_DIR || '/tmp/sglang_cuda_coredumps' }}"
|
||||||
177
third_party/sglang/.github/actions/wait-for-jobs/action.yml
vendored
Normal file
177
third_party/sglang/.github/actions/wait-for-jobs/action.yml
vendored
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
name: Wait for Jobs
|
||||||
|
description: Poll and wait for specified jobs in the current workflow run to complete
|
||||||
|
|
||||||
|
inputs:
|
||||||
|
stage-name:
|
||||||
|
description: 'Human-readable stage name for log messages (e.g. "stage-a")'
|
||||||
|
required: true
|
||||||
|
jobs:
|
||||||
|
description: |
|
||||||
|
JSON array of job specs to wait for. Each element is either:
|
||||||
|
- a string: exact job name (e.g. "stage-a-test-1-gpu-small")
|
||||||
|
- an object { "prefix": "...", "expected_count": N }: for matrix jobs
|
||||||
|
required: true
|
||||||
|
max-wait-minutes:
|
||||||
|
description: 'Maximum time to wait before timing out'
|
||||||
|
required: false
|
||||||
|
default: '240'
|
||||||
|
poll-interval-seconds:
|
||||||
|
description: 'Seconds between polling attempts'
|
||||||
|
required: false
|
||||||
|
default: '60'
|
||||||
|
github-token:
|
||||||
|
description: 'GitHub token for API calls'
|
||||||
|
required: false
|
||||||
|
default: ${{ github.token }}
|
||||||
|
|
||||||
|
outputs:
|
||||||
|
result:
|
||||||
|
description: 'Overall result: success, failure, or timeout'
|
||||||
|
value: ${{ steps.wait.outputs.result }}
|
||||||
|
|
||||||
|
runs:
|
||||||
|
using: composite
|
||||||
|
steps:
|
||||||
|
- name: Wait for jobs to complete
|
||||||
|
id: wait
|
||||||
|
uses: actions/github-script@v7
|
||||||
|
env:
|
||||||
|
INPUT_STAGE_NAME: ${{ inputs.stage-name }}
|
||||||
|
INPUT_JOBS: ${{ inputs.jobs }}
|
||||||
|
INPUT_MAX_WAIT_MINUTES: ${{ inputs.max-wait-minutes }}
|
||||||
|
INPUT_POLL_INTERVAL_SECONDS: ${{ inputs.poll-interval-seconds }}
|
||||||
|
with:
|
||||||
|
github-token: ${{ inputs.github-token }}
|
||||||
|
script: |
|
||||||
|
const stageName = process.env.INPUT_STAGE_NAME;
|
||||||
|
const jobSpecs = JSON.parse(process.env.INPUT_JOBS);
|
||||||
|
const maxWaitMinutes = parseInt(process.env.INPUT_MAX_WAIT_MINUTES);
|
||||||
|
const pollIntervalSeconds = parseInt(process.env.INPUT_POLL_INTERVAL_SECONDS);
|
||||||
|
const maxAttempts = (maxWaitMinutes * 60) / pollIntervalSeconds;
|
||||||
|
|
||||||
|
// Normalize job specs into a uniform format
|
||||||
|
const normalizedSpecs = jobSpecs.map(spec => {
|
||||||
|
if (typeof spec === 'string') {
|
||||||
|
return { prefix: spec, expected_count: 1, exact: true };
|
||||||
|
}
|
||||||
|
return { ...spec, exact: false };
|
||||||
|
});
|
||||||
|
|
||||||
|
const totalExpectedJobs = normalizedSpecs.reduce((sum, s) => sum + s.expected_count, 0);
|
||||||
|
|
||||||
|
const matchesSpec = (jobName, spec) => {
|
||||||
|
if (spec.exact) {
|
||||||
|
return jobName === spec.prefix;
|
||||||
|
}
|
||||||
|
return jobName === spec.prefix || jobName.startsWith(spec.prefix + ' (');
|
||||||
|
};
|
||||||
|
|
||||||
|
// Use ETag conditional requests to avoid consuming rate limit when nothing changed.
|
||||||
|
// GitHub returns 304 Not Modified for unchanged data, which is FREE (no rate limit cost).
|
||||||
|
let lastEtag = '';
|
||||||
|
let lastJobs = null;
|
||||||
|
let apiCalls = 0;
|
||||||
|
let cachedCalls = 0;
|
||||||
|
|
||||||
|
async function fetchJobs() {
|
||||||
|
const url = `GET /repos/{owner}/{repo}/actions/runs/{run_id}/jobs`;
|
||||||
|
const params = {
|
||||||
|
owner: context.repo.owner,
|
||||||
|
repo: context.repo.repo,
|
||||||
|
run_id: context.runId,
|
||||||
|
per_page: 100,
|
||||||
|
headers: {},
|
||||||
|
};
|
||||||
|
if (lastEtag) {
|
||||||
|
params.headers['if-none-match'] = lastEtag;
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
const response = await github.request(url, params);
|
||||||
|
apiCalls++;
|
||||||
|
const rateRemaining = response.headers['x-ratelimit-remaining'] || '?';
|
||||||
|
const rateLimit = response.headers['x-ratelimit-limit'] || '?';
|
||||||
|
console.log(`[rate-limit] ${rateRemaining}/${rateLimit} remaining (ETag: ${lastEtag ? 'sent' : 'none'}) | this session: ${apiCalls} paid, ${cachedCalls} free`);
|
||||||
|
lastEtag = response.headers.etag || '';
|
||||||
|
const jobs = response.data.jobs;
|
||||||
|
|
||||||
|
// Handle pagination if >100 jobs
|
||||||
|
// ETag only covers page 1, so invalidate it to avoid stale cache
|
||||||
|
// when later pages change but page 1 doesn't.
|
||||||
|
if (response.data.total_count > 100) {
|
||||||
|
lastEtag = '';
|
||||||
|
for (let page = 2; page <= Math.ceil(response.data.total_count / 100); page++) {
|
||||||
|
const { data: pageData } = await github.request(url, {
|
||||||
|
...params,
|
||||||
|
page,
|
||||||
|
headers: {},
|
||||||
|
});
|
||||||
|
jobs.push(...pageData.jobs);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
lastJobs = jobs;
|
||||||
|
return { jobs, cached: false };
|
||||||
|
} catch (err) {
|
||||||
|
if (err.status === 304 && lastJobs) {
|
||||||
|
cachedCalls++;
|
||||||
|
console.log(`[rate-limit] 304 Not Modified | this session: ${apiCalls} paid, ${cachedCalls} free`);
|
||||||
|
return { jobs: lastJobs, cached: true };
|
||||||
|
}
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (let attempt = 0; attempt < maxAttempts; attempt++) {
|
||||||
|
const { jobs, cached } = await fetchJobs();
|
||||||
|
|
||||||
|
let allCompleted = true;
|
||||||
|
let failedJobs = [];
|
||||||
|
let completedCount = 0;
|
||||||
|
let totalCount = 0;
|
||||||
|
|
||||||
|
for (const spec of normalizedSpecs) {
|
||||||
|
const matchingJobs = jobs.filter(job => matchesSpec(job.name, spec));
|
||||||
|
|
||||||
|
for (const job of matchingJobs) {
|
||||||
|
totalCount++;
|
||||||
|
if (!cached) {
|
||||||
|
console.log(`${job.name}: status=${job.status}, conclusion=${job.conclusion}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (job.status === 'completed') {
|
||||||
|
completedCount++;
|
||||||
|
if (job.conclusion !== 'success' && job.conclusion !== 'skipped') {
|
||||||
|
failedJobs.push(job.name);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
allCompleted = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (matchingJobs.length < spec.expected_count) {
|
||||||
|
console.log(`${spec.prefix}: found ${matchingJobs.length}/${spec.expected_count} jobs (waiting for more)`);
|
||||||
|
allCompleted = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`[${stageName}] Progress: ${completedCount}/${totalCount} jobs completed (expected ${totalExpectedJobs})${cached ? ' (cached, no rate limit cost)' : ''}`);
|
||||||
|
|
||||||
|
// Fail fast if any jobs failed
|
||||||
|
if (failedJobs.length > 0) {
|
||||||
|
core.setOutput('result', 'failure');
|
||||||
|
core.setFailed(`${stageName} jobs failed: ${failedJobs.join(', ')}`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (allCompleted && totalCount >= totalExpectedJobs) {
|
||||||
|
core.setOutput('result', 'success');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`Waiting ${pollIntervalSeconds}s... (attempt ${attempt + 1}/${maxAttempts})`);
|
||||||
|
await new Promise(resolve => setTimeout(resolve, pollIntervalSeconds * 1000));
|
||||||
|
}
|
||||||
|
|
||||||
|
core.setFailed(`Timeout waiting for ${stageName} jobs`);
|
||||||
|
core.setOutput('result', 'timeout');
|
||||||
411
third_party/sglang/.github/audit_permission.py
vendored
Normal file
411
third_party/sglang/.github/audit_permission.py
vendored
Normal file
@@ -0,0 +1,411 @@
|
|||||||
|
"""
|
||||||
|
Audit GitHub repository collaborators with elevated access.
|
||||||
|
|
||||||
|
This script will:
|
||||||
|
1. Fetch all collaborators with write permission to this repo.
|
||||||
|
2. Show their github username, Nickname and the role (e.g., admin, maintain,
|
||||||
|
custom org role, write, triage).
|
||||||
|
3. Show their last activity related to this repo (last commit, last issue,
|
||||||
|
last pull request). Put the data in YYYY-MM-DD format. Add a column "last activity date" to the CSV, before the above three breakdown columns.
|
||||||
|
4. Show activity on other repos: repos touched via public events in the last 90 days (Push, PR, Issues, etc.). Sort the repos by the number of activities.
|
||||||
|
5. Write results to a CSV sorted by the roles (admin, maintain, custom org role, write, triage) and the last activity date (most recent first).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
export GH_TOKEN="your_github_token"
|
||||||
|
python3 audit_permission.py [--output path] [--repo owner/name]
|
||||||
|
|
||||||
|
Requires: requests, and a token with permission to list collaborators (push+
|
||||||
|
access to the repo).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import csv
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from collections import Counter
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
except ImportError:
|
||||||
|
requests = None # type: ignore
|
||||||
|
|
||||||
|
DEFAULT_OWNER = "sgl-project"
|
||||||
|
DEFAULT_NAME = "sglang"
|
||||||
|
|
||||||
|
HEADERS: dict[str, str] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def _request(
|
||||||
|
method: str,
|
||||||
|
url: str,
|
||||||
|
*,
|
||||||
|
params: dict[str, Any] | None = None,
|
||||||
|
max_retries: int = 3,
|
||||||
|
) -> requests.Response:
|
||||||
|
if requests is None:
|
||||||
|
raise RuntimeError("Install the requests package: pip install requests")
|
||||||
|
for attempt in range(max_retries):
|
||||||
|
r = requests.request(method, url, headers=HEADERS, params=params, timeout=60)
|
||||||
|
if r.status_code == 403 and "rate limit" in (r.text or "").lower():
|
||||||
|
reset = r.headers.get("X-RateLimit-Reset")
|
||||||
|
wait = 60
|
||||||
|
if reset:
|
||||||
|
try:
|
||||||
|
wait = max(1, int(reset) - int(time.time()) + 2)
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
print(f"Rate limited; sleeping {wait}s...", file=sys.stderr)
|
||||||
|
time.sleep(min(wait, 3600))
|
||||||
|
continue
|
||||||
|
return r
|
||||||
|
return r
|
||||||
|
|
||||||
|
|
||||||
|
def paginate_list(url: str, params: dict[str, Any] | None = None) -> list[Any]:
|
||||||
|
out: list[Any] = []
|
||||||
|
next_url: str | None = url
|
||||||
|
next_params = params
|
||||||
|
while next_url:
|
||||||
|
r = _request("GET", next_url, params=next_params)
|
||||||
|
next_params = None
|
||||||
|
if r.status_code != 200:
|
||||||
|
print(
|
||||||
|
f"Error {r.status_code} GET {next_url}: {r.text[:500]}",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
break
|
||||||
|
data = r.json()
|
||||||
|
if isinstance(data, list):
|
||||||
|
out.extend(data)
|
||||||
|
else:
|
||||||
|
break
|
||||||
|
next_url = None
|
||||||
|
link = r.headers.get("Link", "")
|
||||||
|
for part in link.split(", "):
|
||||||
|
if 'rel="next"' in part:
|
||||||
|
start = part.find("<") + 1
|
||||||
|
end = part.find(">")
|
||||||
|
if start > 0 and end > start:
|
||||||
|
next_url = part[start:end]
|
||||||
|
break
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def collaborator_role(collab: dict[str, Any]) -> str:
|
||||||
|
role_name = collab.get("role_name")
|
||||||
|
if isinstance(role_name, str) and role_name.strip():
|
||||||
|
return role_name.strip()
|
||||||
|
perms = collab.get("permissions") or {}
|
||||||
|
if perms.get("admin"):
|
||||||
|
return "admin"
|
||||||
|
if perms.get("maintain"):
|
||||||
|
return "maintain"
|
||||||
|
if perms.get("push"):
|
||||||
|
return "write"
|
||||||
|
if perms.get("triage"):
|
||||||
|
return "triage"
|
||||||
|
return "read"
|
||||||
|
|
||||||
|
|
||||||
|
def has_write_plus(collab: dict[str, Any]) -> bool:
|
||||||
|
perms = collab.get("permissions") or {}
|
||||||
|
return bool(
|
||||||
|
perms.get("admin")
|
||||||
|
or perms.get("maintain")
|
||||||
|
or perms.get("push")
|
||||||
|
or perms.get("triage")
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def role_sort_tier(collab: dict[str, Any]) -> int:
|
||||||
|
"""Sort order: admin (0), maintain (1), custom org role (2), write (3), triage (4)."""
|
||||||
|
rn = collab.get("role_name")
|
||||||
|
if isinstance(rn, str) and rn.strip():
|
||||||
|
k = rn.strip().lower()
|
||||||
|
if k == "admin":
|
||||||
|
return 0
|
||||||
|
if k == "maintain":
|
||||||
|
return 1
|
||||||
|
if k == "write":
|
||||||
|
return 3
|
||||||
|
if k == "triage":
|
||||||
|
return 4
|
||||||
|
if k == "read":
|
||||||
|
return 5
|
||||||
|
return 2
|
||||||
|
perms = collab.get("permissions") or {}
|
||||||
|
if perms.get("admin"):
|
||||||
|
return 0
|
||||||
|
if perms.get("maintain"):
|
||||||
|
return 1
|
||||||
|
if perms.get("push"):
|
||||||
|
return 3
|
||||||
|
if perms.get("triage"):
|
||||||
|
return 4
|
||||||
|
return 5
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_display_name(login: str) -> str:
|
||||||
|
url = f"https://api.github.com/users/{login}"
|
||||||
|
r = _request("GET", url)
|
||||||
|
if r.status_code != 200:
|
||||||
|
return ""
|
||||||
|
data = r.json()
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
return ""
|
||||||
|
n = data.get("name")
|
||||||
|
return n.strip() if isinstance(n, str) else ""
|
||||||
|
|
||||||
|
|
||||||
|
def parse_github_ts(s: str) -> datetime | None:
|
||||||
|
if not s:
|
||||||
|
return None
|
||||||
|
s = s.replace("Z", "+00:00")
|
||||||
|
try:
|
||||||
|
return datetime.fromisoformat(s)
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def iso_timestamp_to_ymd(iso: str | None) -> str:
|
||||||
|
if not iso:
|
||||||
|
return ""
|
||||||
|
p = parse_github_ts(iso)
|
||||||
|
if not p:
|
||||||
|
return ""
|
||||||
|
return p.date().isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
def max_date_ymd(*iso_dates: str | None) -> str:
|
||||||
|
best: datetime | None = None
|
||||||
|
for d in iso_dates:
|
||||||
|
p = parse_github_ts(d or "")
|
||||||
|
if p and (best is None or p > best):
|
||||||
|
best = p
|
||||||
|
return best.date().isoformat() if best else ""
|
||||||
|
|
||||||
|
|
||||||
|
def parse_ymd(s: str) -> datetime | None:
|
||||||
|
if not s:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return datetime.strptime(s, "%Y-%m-%d").replace(tzinfo=timezone.utc)
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def last_commit_date(owner: str, repo: str, login: str) -> str | None:
|
||||||
|
url = f"https://api.github.com/repos/{owner}/{repo}/commits"
|
||||||
|
r = _request("GET", url, params={"author": login, "per_page": 1})
|
||||||
|
if r.status_code != 200:
|
||||||
|
return None
|
||||||
|
data = r.json()
|
||||||
|
if not isinstance(data, list) or not data:
|
||||||
|
return None
|
||||||
|
commit = data[0].get("commit") or {}
|
||||||
|
c = commit.get("committer") or commit.get("author") or {}
|
||||||
|
d = c.get("date")
|
||||||
|
return d if isinstance(d, str) else None
|
||||||
|
|
||||||
|
|
||||||
|
def search_repo_item(
|
||||||
|
owner: str, repo: str, login: str, kind: str
|
||||||
|
) -> dict[str, Any] | None:
|
||||||
|
q = f"repo:{owner}/{repo} is:{kind} author:{login}"
|
||||||
|
url = "https://api.github.com/search/issues"
|
||||||
|
r = _request(
|
||||||
|
"GET",
|
||||||
|
url,
|
||||||
|
params={"q": q, "sort": "updated", "order": "desc", "per_page": 1},
|
||||||
|
)
|
||||||
|
if r.status_code != 200:
|
||||||
|
return None
|
||||||
|
payload = r.json()
|
||||||
|
items = payload.get("items")
|
||||||
|
if not items:
|
||||||
|
return None
|
||||||
|
return items[0] if isinstance(items[0], dict) else None
|
||||||
|
|
||||||
|
|
||||||
|
def last_issue_pr_dates(
|
||||||
|
owner: str, repo: str, login: str
|
||||||
|
) -> tuple[str | None, str | None]:
|
||||||
|
issue = search_repo_item(owner, repo, login, "issue")
|
||||||
|
pr = search_repo_item(owner, repo, login, "pr")
|
||||||
|
issue_dt = None
|
||||||
|
pr_dt = None
|
||||||
|
if issue:
|
||||||
|
issue_dt = issue.get("updated_at") or issue.get("created_at")
|
||||||
|
if not isinstance(issue_dt, str):
|
||||||
|
issue_dt = None
|
||||||
|
if pr:
|
||||||
|
pr_dt = pr.get("updated_at") or pr.get("created_at")
|
||||||
|
if not isinstance(pr_dt, str):
|
||||||
|
pr_dt = None
|
||||||
|
return issue_dt, pr_dt
|
||||||
|
|
||||||
|
|
||||||
|
def other_repos_activity_column(
|
||||||
|
login: str, owner: str, repo: str, days: int = 90
|
||||||
|
) -> str:
|
||||||
|
"""Repos other than this one touched in the window, sorted by event count (desc)."""
|
||||||
|
cutoff = datetime.now(timezone.utc) - timedelta(days=days)
|
||||||
|
full = f"{owner}/{repo}"
|
||||||
|
counts: Counter[str] = Counter()
|
||||||
|
url: str | None = f"https://api.github.com/users/{login}/events/public"
|
||||||
|
params: dict[str, Any] = {"per_page": 100}
|
||||||
|
|
||||||
|
while url:
|
||||||
|
r = _request("GET", url, params=params)
|
||||||
|
params = {}
|
||||||
|
if r.status_code != 200:
|
||||||
|
break
|
||||||
|
events = r.json()
|
||||||
|
if not isinstance(events, list):
|
||||||
|
break
|
||||||
|
oldest_in_page: datetime | None = None
|
||||||
|
for ev in events:
|
||||||
|
if not isinstance(ev, dict):
|
||||||
|
continue
|
||||||
|
created = parse_github_ts(ev.get("created_at") or "")
|
||||||
|
if created:
|
||||||
|
if oldest_in_page is None or created < oldest_in_page:
|
||||||
|
oldest_in_page = created
|
||||||
|
if created and created < cutoff:
|
||||||
|
continue
|
||||||
|
rinfo = ev.get("repo")
|
||||||
|
name = None
|
||||||
|
if isinstance(rinfo, dict):
|
||||||
|
name = rinfo.get("name")
|
||||||
|
if isinstance(name, str) and name and name != full:
|
||||||
|
counts[name] += 1
|
||||||
|
next_url = None
|
||||||
|
link = r.headers.get("Link", "")
|
||||||
|
for part in link.split(", "):
|
||||||
|
if 'rel="next"' in part:
|
||||||
|
s, e = part.find("<") + 1, part.find(">")
|
||||||
|
if s > 0 and e > s:
|
||||||
|
next_url = part[s:e]
|
||||||
|
break
|
||||||
|
if oldest_in_page and oldest_in_page < cutoff:
|
||||||
|
break
|
||||||
|
url = next_url
|
||||||
|
if not events:
|
||||||
|
break
|
||||||
|
|
||||||
|
ordered = sorted(counts.items(), key=lambda x: (-x[1], x[0]))
|
||||||
|
return ";".join(f"{n}:{c}" for n, c in ordered)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description="Audit repo collaborator permissions.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--repo",
|
||||||
|
default=f"{DEFAULT_OWNER}/{DEFAULT_NAME}",
|
||||||
|
help=f"owner/name (default: {DEFAULT_OWNER}/{DEFAULT_NAME})",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
"-o",
|
||||||
|
default=os.path.join(os.path.dirname(__file__), "permission_audit.csv"),
|
||||||
|
help="Output CSV path",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--events-days",
|
||||||
|
type=int,
|
||||||
|
default=90,
|
||||||
|
help="Window for other-repo activity via public events",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if "/" not in args.repo:
|
||||||
|
print("Error: --repo must be owner/name", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
owner, name = args.repo.split("/", 1)
|
||||||
|
|
||||||
|
gh_token = os.getenv("GH_TOKEN")
|
||||||
|
if not gh_token:
|
||||||
|
print("Error: GH_TOKEN environment variable is not set.", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
global HEADERS
|
||||||
|
HEADERS = {
|
||||||
|
"Authorization": f"Bearer {gh_token}",
|
||||||
|
"Accept": "application/vnd.github+json",
|
||||||
|
"X-GitHub-Api-Version": "2022-11-28",
|
||||||
|
}
|
||||||
|
|
||||||
|
collab_url = f"https://api.github.com/repos/{owner}/{name}/collaborators"
|
||||||
|
print(f"Fetching collaborators for {owner}/{name}...", file=sys.stderr)
|
||||||
|
collaborators = paginate_list(
|
||||||
|
collab_url, params={"per_page": 100, "affiliation": "all"}
|
||||||
|
)
|
||||||
|
|
||||||
|
rows: list[dict[str, Any]] = []
|
||||||
|
elevated = [c for c in collaborators if isinstance(c, dict) and has_write_plus(c)]
|
||||||
|
print(
|
||||||
|
f"Found {len(elevated)} collaborators with admin/maintain/write/triage.",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
|
||||||
|
for i, col in enumerate(elevated, start=1):
|
||||||
|
login = col.get("login")
|
||||||
|
if not isinstance(login, str):
|
||||||
|
continue
|
||||||
|
print(f" [{i}/{len(elevated)}] {login}", file=sys.stderr)
|
||||||
|
|
||||||
|
role = collaborator_role(col)
|
||||||
|
nickname = fetch_display_name(login)
|
||||||
|
cd = last_commit_date(owner, name, login)
|
||||||
|
issue_dt, pr_dt = last_issue_pr_dates(owner, name, login)
|
||||||
|
last_act_ymd = max_date_ymd(cd, issue_dt, pr_dt)
|
||||||
|
others = other_repos_activity_column(login, owner, name, days=args.events_days)
|
||||||
|
rows.append(
|
||||||
|
{
|
||||||
|
"_role_tier": role_sort_tier(col),
|
||||||
|
"github_username": login,
|
||||||
|
"nickname": nickname,
|
||||||
|
"role": role,
|
||||||
|
"last_activity_date": last_act_ymd,
|
||||||
|
"last_commit_date": iso_timestamp_to_ymd(cd),
|
||||||
|
"last_issue_date": iso_timestamp_to_ymd(issue_dt),
|
||||||
|
"last_pr_date": iso_timestamp_to_ymd(pr_dt),
|
||||||
|
"other_repos_90d": others,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
def sort_key(r: dict[str, Any]) -> tuple[int, float]:
|
||||||
|
tier = r["_role_tier"]
|
||||||
|
act = parse_ymd(r.get("last_activity_date") or "")
|
||||||
|
ts = act.timestamp() if act else 0.0
|
||||||
|
return (tier, -ts)
|
||||||
|
|
||||||
|
rows.sort(key=sort_key)
|
||||||
|
|
||||||
|
fieldnames = [
|
||||||
|
"github_username",
|
||||||
|
"nickname",
|
||||||
|
"role",
|
||||||
|
"last_activity_date",
|
||||||
|
"last_commit_date",
|
||||||
|
"last_issue_date",
|
||||||
|
"last_pr_date",
|
||||||
|
"other_repos_90d",
|
||||||
|
]
|
||||||
|
for r in rows:
|
||||||
|
del r["_role_tier"]
|
||||||
|
with open(args.output, "w", newline="", encoding="utf-8") as f:
|
||||||
|
w = csv.DictWriter(f, fieldnames=fieldnames)
|
||||||
|
w.writeheader()
|
||||||
|
w.writerows(rows)
|
||||||
|
|
||||||
|
print(f"Wrote {len(rows)} rows to {args.output}", file=sys.stderr)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
122
third_party/sglang/.github/labeler.yml
vendored
Normal file
122
third_party/sglang/.github/labeler.yml
vendored
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
# Configuration for the GitHub Labeler action
|
||||||
|
# Automatically adds labels to PRs based on the files changed
|
||||||
|
|
||||||
|
# Router specific (Rust code in sgl-model-gateway)
|
||||||
|
model-gateway:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file: 'sgl-model-gateway/**/*'
|
||||||
|
|
||||||
|
# Kernel specific
|
||||||
|
sgl-kernel:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file: 'sgl-kernel/**/*'
|
||||||
|
|
||||||
|
# JIT kernel specific
|
||||||
|
jit-kernel:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file: 'python/sglang/jit_kernel/**/*'
|
||||||
|
|
||||||
|
# Documentation
|
||||||
|
documentation:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*.md'
|
||||||
|
- 'docs/**/*'
|
||||||
|
- 'README*'
|
||||||
|
|
||||||
|
# Dependencies
|
||||||
|
dependencies:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/requirements*.txt'
|
||||||
|
- '**/Cargo.toml'
|
||||||
|
- '**/Cargo.lock'
|
||||||
|
- '**/pyproject*.toml'
|
||||||
|
- '**/setup.py'
|
||||||
|
- '**/poetry.lock'
|
||||||
|
- '**/package.json'
|
||||||
|
- '**/package-lock.json'
|
||||||
|
|
||||||
|
# Multi-modal
|
||||||
|
Multi-modal:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*multimodal*'
|
||||||
|
- '**/*vision*'
|
||||||
|
- '**/*vlm*'
|
||||||
|
|
||||||
|
# Diffusion
|
||||||
|
diffusion:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file: 'python/sglang/multimodal_gen/**/*'
|
||||||
|
|
||||||
|
# LoRA
|
||||||
|
lora:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*lora*'
|
||||||
|
|
||||||
|
# Quantization
|
||||||
|
quant:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*quant*'
|
||||||
|
- '**/*quantization*'
|
||||||
|
|
||||||
|
# Speculative decoding
|
||||||
|
speculative-decoding:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*speculative*'
|
||||||
|
|
||||||
|
# AMD specific
|
||||||
|
amd:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*amd*'
|
||||||
|
- '**/*rocm*'
|
||||||
|
|
||||||
|
# NPU specific
|
||||||
|
npu:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*npu*'
|
||||||
|
- '**/*ascend*'
|
||||||
|
|
||||||
|
# Blackwell
|
||||||
|
blackwell:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*nvfp4*'
|
||||||
|
- 'sgl-kernel/csrc/attention/cutlass_sm100_mla/**/*'
|
||||||
|
- 'python/sglang/srt/layers/attention/trtllm_mla_backend.py'
|
||||||
|
- 'python/sglang/srt/layers/attention/trtllm_mha_backend.py'
|
||||||
|
|
||||||
|
# DeepSeek specific
|
||||||
|
deepseek:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*deepseek*'
|
||||||
|
|
||||||
|
# HiCache
|
||||||
|
hicache:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*hicache*'
|
||||||
|
|
||||||
|
# Deterministic
|
||||||
|
deterministic:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file: 'python/sglang/srt/batch_invariant_ops/**/*'
|
||||||
|
|
||||||
|
# Piecewise CUDA Graph
|
||||||
|
piecewise-cuda-graph:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file: 'python/sglang/srt/compilation/**/*'
|
||||||
|
|
||||||
|
# Moore Threads specific
|
||||||
|
mthreads:
|
||||||
|
- changed-files:
|
||||||
|
- any-glob-to-any-file:
|
||||||
|
- '**/*mthreads*'
|
||||||
|
- '**/*musa*'
|
||||||
42
third_party/sglang/.github/linters/lychee-ci.toml
vendored
Normal file
42
third_party/sglang/.github/linters/lychee-ci.toml
vendored
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
no_progress = true
|
||||||
|
verbose = "warn"
|
||||||
|
timeout = 20
|
||||||
|
max_concurrency = 8
|
||||||
|
retry_wait_time = 2
|
||||||
|
max_retries = 2
|
||||||
|
|
||||||
|
# CI should validate external links over the network.
|
||||||
|
offline = false
|
||||||
|
scheme = ["http", "https"]
|
||||||
|
|
||||||
|
exclude_path = [
|
||||||
|
# Exclude generated Sphinx build artifacts.
|
||||||
|
# - "(\\./)?" allows both "docs/..." and "./docs/..."
|
||||||
|
# - "[/\\\\]" supports both slash styles in CI environments
|
||||||
|
"^(\\./)?docs[/\\\\]_build[/\\\\]",
|
||||||
|
]
|
||||||
|
|
||||||
|
exclude = [
|
||||||
|
# Local-only endpoints referenced in docs/examples.
|
||||||
|
# These are expected to be unreachable in GitHub-hosted CI.
|
||||||
|
"^https?://localhost(:[0-9]+)?(/|$)",
|
||||||
|
"^http://127\\.0\\.0\\.1(:[0-9]+)?(/|$)",
|
||||||
|
# Vendor pages that frequently block/deny CI user-agents (transient 403/anti-bot).
|
||||||
|
"^https://www\\.intel\\.com/content/www/us/en/ark/products/series/240391/intel-arc-b-series-graphics\\.html$",
|
||||||
|
"^https://www\\.intel\\.com/content/www/us/en/ark/products/series/242616/intel-arc-pro-b-series-graphics\\.html$",
|
||||||
|
"^https://www\\.intel\\.com/content/www/us/en/products/sku/241598/intel-arc-b580-graphics/specifications\\.html$",
|
||||||
|
|
||||||
|
# Non-routable bind address used in examples, never externally reachable.
|
||||||
|
"^http://0\\.0\\.0\\.0(/|$)",
|
||||||
|
|
||||||
|
# Large doc portals with anti-bot/rate-limit behavior in CI.
|
||||||
|
# We keep API docs references in content but do not fail CI on access policy.
|
||||||
|
"^https://platform\\.openai\\.com/docs/",
|
||||||
|
"^https://gamma\\.app/docs/Optimizing-RL-with-SGLang-y0kqgj877k34779$",
|
||||||
|
"^https://aflah02\\.substack\\.com/p/multi-node-llm-inference-with-sglang/?$",
|
||||||
|
|
||||||
|
# Known noisy image URLs used in notebook-rendered examples.
|
||||||
|
"^https://github\\.com/sgl-project/sglang/blob/main/examples/assets/example_image\\.png\\?raw=true$",
|
||||||
|
"^https://raw\\.githubusercontent\\.com/sgl-project/sglang/main/examples/assets/example_image\\.png/?$",
|
||||||
|
"^https://raw\\.githubusercontent\\.com/sgl-project/sglang/main/assets/logo\\.png/?$",
|
||||||
|
]
|
||||||
18
third_party/sglang/.github/linters/lychee.toml
vendored
Normal file
18
third_party/sglang/.github/linters/lychee.toml
vendored
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
# .github/linters/lychee.toml
|
||||||
|
no_progress = true
|
||||||
|
verbose = "warn"
|
||||||
|
timeout = 20
|
||||||
|
max_concurrency = 8
|
||||||
|
|
||||||
|
offline = true
|
||||||
|
|
||||||
|
# Ignore generated docs output; check source docs only.
|
||||||
|
exclude_path = [
|
||||||
|
"^(\\./)?docs[/\\\\]_build[/\\\\]",
|
||||||
|
]
|
||||||
|
|
||||||
|
exclude = [
|
||||||
|
"^https?://localhost(:[0-9]+)?(/|$)",
|
||||||
|
"^http://127\\.0\\.0\\.1(:[0-9]+)?(/|$)",
|
||||||
|
"^http://0\\.0\\.0\\.0(/|$)",
|
||||||
|
]
|
||||||
33
third_party/sglang/.github/pull_request_template.md
vendored
Normal file
33
third_party/sglang/.github/pull_request_template.md
vendored
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
<!-- Thank you for your contribution! Please follow these guidelines to enhance your pull request. If anything is unclear, submit your PR and reach out to maintainers for assistance. Join our Slack community at https://slack.sglang.io to discuss further. -->
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
|
||||||
|
<!-- Describe the purpose and goals of this pull request. -->
|
||||||
|
|
||||||
|
## Modifications
|
||||||
|
|
||||||
|
<!-- Detail the changes made in this pull request. -->
|
||||||
|
|
||||||
|
## Accuracy Tests
|
||||||
|
|
||||||
|
<!-- If this pull request affects model outputs (e.g., changes to the kernel or model forward code), provide accuracy test results. -->
|
||||||
|
|
||||||
|
## Speed Tests and Profiling
|
||||||
|
|
||||||
|
<!-- If this pull request impacts inference speed, provide benchmarking and profiling results. -->
|
||||||
|
|
||||||
|
## Checklist
|
||||||
|
|
||||||
|
- [ ] Format your code according to the [Format code with pre-commit](https://docs.sglang.io/developer_guide/contribution_guide.html#format-code-with-pre-commit).
|
||||||
|
- [ ] Add unit tests according to the [Run and add unit tests](https://docs.sglang.io/developer_guide/contribution_guide.html#run-and-add-unit-tests).
|
||||||
|
- [ ] Update documentation according to [Write documentations](https://docs.sglang.io/developer_guide/contribution_guide.html#write-documentations).
|
||||||
|
- [ ] Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.io/developer_guide/contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.io/developer_guide/contribution_guide.html#benchmark-the-speed).
|
||||||
|
- [ ] Follow the SGLang code style [guidance](https://docs.sglang.io/developer_guide/contribution_guide.html#code-style-guidance).
|
||||||
|
|
||||||
|
## Review and Merge Process
|
||||||
|
|
||||||
|
1. Ping Merge Oncalls to start the process. See the [PR Merge Process](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER.md#pull-request-merge-process).
|
||||||
|
2. Get approvals from [CODEOWNERS](https://github.com/sgl-project/sglang/blob/main/.github/CODEOWNERS) and other reviewers.
|
||||||
|
3. Trigger CI tests with [comments](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests) or contact authorized users to do so.
|
||||||
|
- Common commands include `/tag-and-rerun-ci`, `/tag-run-ci-label`, `/rerun-failed-ci`
|
||||||
|
4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.
|
||||||
244
third_party/sglang/.github/update_ci_permission.py
vendored
Normal file
244
third_party/sglang/.github/update_ci_permission.py
vendored
Normal file
@@ -0,0 +1,244 @@
|
|||||||
|
"""
|
||||||
|
Update the CI permissions configuration file.
|
||||||
|
|
||||||
|
This script updates the `CI_PERMISSIONS.json` file, which defines the CI permissions granted to each user.
|
||||||
|
|
||||||
|
The format of `CI_PERMISSIONS.json` is as follows:
|
||||||
|
|
||||||
|
{
|
||||||
|
"username1": {
|
||||||
|
"can_tag_run_ci_label": true,
|
||||||
|
"can_rerun_failed_ci": true,
|
||||||
|
"cooldown_interval_minutes": 0,
|
||||||
|
"reason": "top contributor"
|
||||||
|
},
|
||||||
|
"username2": {
|
||||||
|
"can_tag_run_ci_label": true,
|
||||||
|
"can_rerun_failed_ci": true,
|
||||||
|
"cooldown_interval_minutes": 60,
|
||||||
|
"reason": "custom override"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Permissions are assigned according to the following rules:
|
||||||
|
|
||||||
|
1. Add the top 50 contributors from the last 120 days with full permissions, no cooldown, and the reason "top contributor".
|
||||||
|
2. Load all users from the existing `CI_PERMISSIONS.json` file and update their entries as follows:
|
||||||
|
- If a user is already covered by rule 1, skip that user.
|
||||||
|
- If the old reason of a user is "top contributor" but they are not in the current top contributors list, change their configuration to:
|
||||||
|
{
|
||||||
|
"can_tag_run_ci_label": true,
|
||||||
|
"can_rerun_failed_ci": true,
|
||||||
|
"cooldown_interval_minutes": 60,
|
||||||
|
"reason": "custom override"
|
||||||
|
}
|
||||||
|
- For all other cases, preserve the original configuration unchanged.
|
||||||
|
3. All other users receive no permissions and a 120-minute cooldown (they are omitted from the file).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
export GH_TOKEN="your_github_token"
|
||||||
|
python3 update_ci_permission.py
|
||||||
|
|
||||||
|
# Sort-only mode (no network calls, no GH_TOKEN required)
|
||||||
|
python3 update_ci_permission.py --sort-only
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from collections import Counter
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
except ImportError:
|
||||||
|
requests = None # Only needed for non-sort-only runs
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
REPO_OWNER = "sgl-project"
|
||||||
|
REPO_NAME = "sglang"
|
||||||
|
FILE_NAME = os.path.join(os.path.dirname(__file__), "CI_PERMISSIONS.json")
|
||||||
|
HEADERS = {}
|
||||||
|
|
||||||
|
|
||||||
|
def github_api_get(endpoint, params=None):
|
||||||
|
"""Helper to make paginated GitHub API requests."""
|
||||||
|
if requests is None:
|
||||||
|
raise RuntimeError(
|
||||||
|
"The requests package is required. Install it or use --sort-only."
|
||||||
|
)
|
||||||
|
if not HEADERS:
|
||||||
|
raise RuntimeError(
|
||||||
|
"GitHub headers not initialized. Set GH_TOKEN or use --sort-only."
|
||||||
|
)
|
||||||
|
|
||||||
|
results = []
|
||||||
|
url = f"https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/{endpoint}"
|
||||||
|
|
||||||
|
while url:
|
||||||
|
response = requests.get(url, headers=HEADERS, params=params)
|
||||||
|
if response.status_code != 200:
|
||||||
|
print(f"Error fetching {url}: {response.status_code} {response.text}")
|
||||||
|
# If we fail to fetch, strictly return what we have or empty to avoid crashing logic
|
||||||
|
break
|
||||||
|
|
||||||
|
data = response.json()
|
||||||
|
if isinstance(data, list):
|
||||||
|
results.extend(data)
|
||||||
|
else:
|
||||||
|
return data # Non-list response (not paginated usually)
|
||||||
|
|
||||||
|
# Handle pagination
|
||||||
|
url = None
|
||||||
|
if "link" in response.headers:
|
||||||
|
links = response.headers["link"].split(", ")
|
||||||
|
for link in links:
|
||||||
|
if 'rel="next"' in link:
|
||||||
|
url = link[link.find("<") + 1 : link.find(">")]
|
||||||
|
params = None # Params are included in the next link
|
||||||
|
break
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def get_write_access_users():
|
||||||
|
"""Fetches users with push (write) or admin access."""
|
||||||
|
print("Fetching collaborators with write access...")
|
||||||
|
# Note: This endpoint usually requires admin rights on the token.
|
||||||
|
collaborators = github_api_get("collaborators", params={"per_page": 100})
|
||||||
|
|
||||||
|
writers = set()
|
||||||
|
for col in collaborators:
|
||||||
|
perms = col.get("permissions", {})
|
||||||
|
# Check for admin, maintain, or push rights
|
||||||
|
if perms.get("admin") or perms.get("maintain") or perms.get("push"):
|
||||||
|
writers.add(col["login"])
|
||||||
|
|
||||||
|
print(f"Found {len(writers)} users with write access.")
|
||||||
|
return writers
|
||||||
|
|
||||||
|
|
||||||
|
def get_top_contributors(days, limit):
|
||||||
|
"""Fetches top contributors based on commit count in the last N days."""
|
||||||
|
print(f"Fetching commits from the last {days} days...")
|
||||||
|
since_date = (datetime.now(timezone.utc) - timedelta(days=days)).isoformat()
|
||||||
|
|
||||||
|
# Fetch commits
|
||||||
|
commits = github_api_get("commits", params={"since": since_date, "per_page": 100})
|
||||||
|
|
||||||
|
author_counts = Counter()
|
||||||
|
for commit in commits:
|
||||||
|
# commit['author'] contains the GitHub user object (can be None if not linked)
|
||||||
|
if commit.get("author") and "login" in commit["author"]:
|
||||||
|
author_counts[commit["author"]["login"]] += 1
|
||||||
|
|
||||||
|
top_users = [user for user, _ in author_counts.most_common(limit)]
|
||||||
|
print(f"Found {len(top_users)} top contributors in the last {days} days.")
|
||||||
|
return set(top_users)
|
||||||
|
|
||||||
|
|
||||||
|
def load_existing_permissions():
|
||||||
|
if os.path.exists(FILE_NAME):
|
||||||
|
try:
|
||||||
|
with open(FILE_NAME, "r") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
print(f"Warning: {FILE_NAME} is invalid JSON. Starting fresh.")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def sort_permissions_file():
|
||||||
|
"""Sort the existing CI permissions file alphabetically and exit."""
|
||||||
|
if not os.path.exists(FILE_NAME):
|
||||||
|
print(f"{FILE_NAME} not found. Nothing to sort.")
|
||||||
|
return
|
||||||
|
|
||||||
|
old_permissions = load_existing_permissions()
|
||||||
|
sorted_permissions = dict(sorted(old_permissions.items()))
|
||||||
|
|
||||||
|
with open(FILE_NAME, "w") as f:
|
||||||
|
json.dump(sorted_permissions, f, indent=4)
|
||||||
|
f.write("\n")
|
||||||
|
|
||||||
|
print(f"Sorted {FILE_NAME}. Total users: {len(sorted_permissions)}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Update or sort CI permissions.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--sort-only",
|
||||||
|
action="store_true",
|
||||||
|
help="Only sort CI_PERMISSIONS.json alphabetically without fetching data.",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.sort_only:
|
||||||
|
sort_permissions_file()
|
||||||
|
return
|
||||||
|
|
||||||
|
gh_token = os.getenv("GH_TOKEN")
|
||||||
|
if not gh_token:
|
||||||
|
raise ValueError("Error: GH_TOKEN environment variable is not set.")
|
||||||
|
|
||||||
|
global HEADERS
|
||||||
|
HEADERS = {
|
||||||
|
"Authorization": f"Bearer {gh_token}",
|
||||||
|
"Accept": "application/vnd.github+json",
|
||||||
|
"X-GitHub-Api-Version": "2022-11-28",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Gather Data
|
||||||
|
try:
|
||||||
|
write_access_users = get_write_access_users()
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Could not fetch collaborators (check token scope). Error: {e}")
|
||||||
|
write_access_users = set()
|
||||||
|
|
||||||
|
top_contributors = get_top_contributors(days=120, limit=50)
|
||||||
|
old_permissions = load_existing_permissions()
|
||||||
|
|
||||||
|
new_permissions = {}
|
||||||
|
|
||||||
|
# Rule 1: Add Top 50 Contributors
|
||||||
|
for user in top_contributors:
|
||||||
|
new_permissions[user] = {
|
||||||
|
"can_tag_run_ci_label": True,
|
||||||
|
"can_rerun_failed_ci": True,
|
||||||
|
"can_rerun_stage": True,
|
||||||
|
"cooldown_interval_minutes": 0,
|
||||||
|
"reason": "top contributor",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Rule 2: Process Existing Users (Merge Logic)
|
||||||
|
for user, config in old_permissions.items():
|
||||||
|
if user in new_permissions:
|
||||||
|
# Already handled by Rule 1 or 2
|
||||||
|
continue
|
||||||
|
|
||||||
|
old_reason = config.get("reason", "")
|
||||||
|
|
||||||
|
# If they fell off the top contributor list
|
||||||
|
if old_reason in ["top contributor"]:
|
||||||
|
new_permissions[user] = {
|
||||||
|
"can_tag_run_ci_label": True,
|
||||||
|
"can_rerun_failed_ci": True,
|
||||||
|
"can_rerun_stage": True,
|
||||||
|
"cooldown_interval_minutes": 60,
|
||||||
|
"reason": "custom override",
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
# Preserve custom overrides
|
||||||
|
new_permissions[user] = config
|
||||||
|
|
||||||
|
# Save and Sort
|
||||||
|
# Sorting keys for cleaner diffs
|
||||||
|
sorted_permissions = dict(sorted(new_permissions.items()))
|
||||||
|
|
||||||
|
with open(FILE_NAME, "w") as f:
|
||||||
|
json.dump(sorted_permissions, f, indent=4)
|
||||||
|
f.write("\n") # Add trailing newline
|
||||||
|
|
||||||
|
print(f"Successfully updated {FILE_NAME}. Total users: {len(sorted_permissions)}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
161
third_party/sglang/.github/workflows/amd-aiter-scout.yml
vendored
Normal file
161
third_party/sglang/.github/workflows/amd-aiter-scout.yml
vendored
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
name: AMD AITER Scout
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 20 * * 1' # Monday 20:00 UTC
|
||||||
|
- cron: '0 20 * * 4' # Thursday 20:00 UTC
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
aiter_ref:
|
||||||
|
description: 'AITER git ref (branch, tag, or SHA). Default: main (latest commit)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'main'
|
||||||
|
job_filter:
|
||||||
|
description: 'Comma-separated workflows to run: nightly-amd, nightly-amd-rocm720, pr-test-amd, pr-test-amd-rocm720. Default: all'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'all'
|
||||||
|
continue_on_error:
|
||||||
|
description: 'Continue running other workflows even if one fails'
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: true
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: amd-aiter-scout-${{ github.run_id }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
resolve-aiter:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
aiter_sha: ${{ steps.resolve.outputs.sha }}
|
||||||
|
run_nightly_amd: ${{ steps.parse.outputs.run_nightly_amd }}
|
||||||
|
run_nightly_amd_rocm720: ${{ steps.parse.outputs.run_nightly_amd_rocm720 }}
|
||||||
|
run_pr_test_amd: ${{ steps.parse.outputs.run_pr_test_amd }}
|
||||||
|
run_pr_test_amd_rocm720: ${{ steps.parse.outputs.run_pr_test_amd_rocm720 }}
|
||||||
|
steps:
|
||||||
|
- name: Resolve AITER commit
|
||||||
|
id: resolve
|
||||||
|
run: |
|
||||||
|
REF="${{ inputs.aiter_ref || 'main' }}"
|
||||||
|
echo "Resolving AITER ref: ${REF}"
|
||||||
|
|
||||||
|
SHA=$(git ls-remote https://github.com/ROCm/aiter.git "refs/heads/${REF}" | head -1 | cut -f1)
|
||||||
|
if [ -z "$SHA" ]; then
|
||||||
|
SHA=$(git ls-remote https://github.com/ROCm/aiter.git "refs/tags/${REF}" | head -1 | cut -f1)
|
||||||
|
fi
|
||||||
|
if [ -z "$SHA" ]; then
|
||||||
|
SHA=$(git ls-remote https://github.com/ROCm/aiter.git "${REF}" | head -1 | cut -f1)
|
||||||
|
fi
|
||||||
|
if [ -z "$SHA" ]; then
|
||||||
|
SHA="${REF}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "sha=${SHA}" >> $GITHUB_OUTPUT
|
||||||
|
echo "### AITER Ref Resolution" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Requested ref:** \`${REF}\`" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Resolved SHA:** \`${SHA}\`" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **AITER commit:** https://github.com/ROCm/aiter/commit/${SHA}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
- name: Parse job filter
|
||||||
|
id: parse
|
||||||
|
run: |
|
||||||
|
FILTER="${{ inputs.job_filter || 'all' }}"
|
||||||
|
echo "Job filter: ${FILTER}"
|
||||||
|
|
||||||
|
if [[ "$FILTER" == "all" ]]; then
|
||||||
|
echo "run_nightly_amd=true" >> $GITHUB_OUTPUT
|
||||||
|
echo "run_nightly_amd_rocm720=true" >> $GITHUB_OUTPUT
|
||||||
|
echo "run_pr_test_amd=true" >> $GITHUB_OUTPUT
|
||||||
|
echo "run_pr_test_amd_rocm720=true" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
# Wrap with commas for exact substring matching (avoids "nightly-amd" matching "nightly-amd-rocm720")
|
||||||
|
PADDED=",${FILTER// /},"
|
||||||
|
echo "run_nightly_amd=$(echo "$PADDED" | grep -q ',nightly-amd,' && echo true || echo false)" >> $GITHUB_OUTPUT
|
||||||
|
echo "run_nightly_amd_rocm720=$(echo "$PADDED" | grep -q ',nightly-amd-rocm720,' && echo true || echo false)" >> $GITHUB_OUTPUT
|
||||||
|
echo "run_pr_test_amd=$(echo "$PADDED" | grep -q ',pr-test-amd,' && echo true || echo false)" >> $GITHUB_OUTPUT
|
||||||
|
echo "run_pr_test_amd_rocm720=$(echo "$PADDED" | grep -q ',pr-test-amd-rocm720,' && echo true || echo false)" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "### Job Filter" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Filter:** \`${FILTER}\`" >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
call-nightly-amd:
|
||||||
|
if: needs.resolve-aiter.outputs.run_nightly_amd == 'true'
|
||||||
|
needs: resolve-aiter
|
||||||
|
uses: ./.github/workflows/nightly-test-amd.yml
|
||||||
|
secrets: inherit
|
||||||
|
with:
|
||||||
|
ref: ${{ github.sha }}
|
||||||
|
aiter_ref: ${{ needs.resolve-aiter.outputs.aiter_sha }}
|
||||||
|
job_filter: 'all'
|
||||||
|
continue_on_error: ${{ inputs.continue_on_error == '' && true || inputs.continue_on_error }}
|
||||||
|
|
||||||
|
call-nightly-amd-rocm720:
|
||||||
|
if: needs.resolve-aiter.outputs.run_nightly_amd_rocm720 == 'true'
|
||||||
|
needs: resolve-aiter
|
||||||
|
uses: ./.github/workflows/nightly-test-amd-rocm720.yml
|
||||||
|
secrets: inherit
|
||||||
|
with:
|
||||||
|
ref: ${{ github.sha }}
|
||||||
|
aiter_ref: ${{ needs.resolve-aiter.outputs.aiter_sha }}
|
||||||
|
job_filter: 'all'
|
||||||
|
continue_on_error: ${{ inputs.continue_on_error == '' && true || inputs.continue_on_error }}
|
||||||
|
|
||||||
|
call-pr-test-amd:
|
||||||
|
if: needs.resolve-aiter.outputs.run_pr_test_amd == 'true'
|
||||||
|
needs: resolve-aiter
|
||||||
|
uses: ./.github/workflows/pr-test-amd.yml
|
||||||
|
secrets: inherit
|
||||||
|
with:
|
||||||
|
run_all_tests: true
|
||||||
|
aiter_ref: ${{ needs.resolve-aiter.outputs.aiter_sha }}
|
||||||
|
continue_on_error: ${{ inputs.continue_on_error == '' && true || inputs.continue_on_error }}
|
||||||
|
|
||||||
|
call-pr-test-amd-rocm720:
|
||||||
|
if: needs.resolve-aiter.outputs.run_pr_test_amd_rocm720 == 'true'
|
||||||
|
needs: resolve-aiter
|
||||||
|
uses: ./.github/workflows/pr-test-amd-rocm720.yml
|
||||||
|
secrets: inherit
|
||||||
|
with:
|
||||||
|
run_all_tests: true
|
||||||
|
aiter_ref: ${{ needs.resolve-aiter.outputs.aiter_sha }}
|
||||||
|
continue_on_error: ${{ inputs.continue_on_error == '' && true || inputs.continue_on_error }}
|
||||||
|
|
||||||
|
check-all-jobs:
|
||||||
|
if: always()
|
||||||
|
needs:
|
||||||
|
- resolve-aiter
|
||||||
|
- call-nightly-amd
|
||||||
|
- call-nightly-amd-rocm720
|
||||||
|
- call-pr-test-amd
|
||||||
|
- call-pr-test-amd-rocm720
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Summary
|
||||||
|
run: |
|
||||||
|
echo "## AMD AITER Scout Results" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **AITER SHA:** \`${{ needs.resolve-aiter.outputs.aiter_sha }}\`" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **AITER commit:** https://github.com/ROCm/aiter/commit/${{ needs.resolve-aiter.outputs.aiter_sha }}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Workflow | Result |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "|----------|--------|" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Nightly AMD (AITER Latest) | \`${{ needs.call-nightly-amd.result }}\` |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Nightly AMD ROCm 7.2 | \`${{ needs.call-nightly-amd-rocm720.result }}\` |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| PR Test AMD (AITER Latest) | \`${{ needs.call-pr-test-amd.result }}\` |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| PR Test AMD ROCm 7.2 | \`${{ needs.call-pr-test-amd-rocm720.result }}\` |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
- name: Check if any job failed
|
||||||
|
run: |
|
||||||
|
if [[ "${{ contains(needs.*.result, 'failure') }}" == "true" ]]; then
|
||||||
|
echo "One or more workflows failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if [[ "${{ contains(needs.*.result, 'cancelled') }}" == "true" ]]; then
|
||||||
|
echo "One or more workflows were cancelled"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "All workflows passed"
|
||||||
338
third_party/sglang/.github/workflows/amd-ci-job-monitor.yml
vendored
Normal file
338
third_party/sglang/.github/workflows/amd-ci-job-monitor.yml
vendored
Normal file
@@ -0,0 +1,338 @@
|
|||||||
|
name: AMD CI Job Monitor
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 0 * * *' # Daily at midnight UTC
|
||||||
|
pull_request:
|
||||||
|
paths:
|
||||||
|
- '.github/workflows/amd-ci-job-monitor.yml'
|
||||||
|
- 'scripts/ci/utils/query_job_status.py'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
hours:
|
||||||
|
description: 'Time window in hours'
|
||||||
|
required: false
|
||||||
|
default: '24'
|
||||||
|
type: string
|
||||||
|
job_filter:
|
||||||
|
description: 'Job name filter (leave empty for all AMD jobs)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
fetch-actions-data:
|
||||||
|
name: Fetch Actions Snapshot
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Select workflows for snapshot
|
||||||
|
id: select-workflows
|
||||||
|
run: |
|
||||||
|
if [[ -n "${{ inputs.job_filter }}" ]]; then
|
||||||
|
echo "workflows=pr-test-amd.yml" >> "$GITHUB_OUTPUT"
|
||||||
|
else
|
||||||
|
echo "workflows=pr-test-amd.yml,nightly-test-amd.yml,pr-test-amd-rocm720.yml,nightly-test-amd-rocm720.yml" >> "$GITHUB_OUTPUT"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Fetch Actions data snapshot
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--workflow "${{ steps.select-workflows.outputs.workflows }}" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--dump-data-file actions-job-snapshot.json
|
||||||
|
|
||||||
|
- name: Upload Actions data snapshot
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: actions-job-snapshot.json
|
||||||
|
if-no-files-found: error
|
||||||
|
|
||||||
|
# Single job filter mode
|
||||||
|
custom-report:
|
||||||
|
name: Custom Job Report
|
||||||
|
if: ${{ inputs.job_filter }}
|
||||||
|
needs: fetch-actions-data
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Download Actions data snapshot
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: ci-data
|
||||||
|
|
||||||
|
- name: Generate Custom Job Report
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--job "${{ inputs.job_filter }}" \
|
||||||
|
--workflow "pr-test-amd.yml" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--input-data-file ci-data/actions-job-snapshot.json \
|
||||||
|
--summary
|
||||||
|
|
||||||
|
# Parse workflow files to get job names dynamically
|
||||||
|
parse-workflows:
|
||||||
|
name: Parse Workflow Jobs
|
||||||
|
if: ${{ !inputs.job_filter }}
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
pr_jobs: ${{ steps.parse.outputs.pr_jobs }}
|
||||||
|
nightly_jobs: ${{ steps.parse.outputs.nightly_jobs }}
|
||||||
|
pr_rocm720_jobs: ${{ steps.parse.outputs.pr_rocm720_jobs }}
|
||||||
|
nightly_rocm720_jobs: ${{ steps.parse.outputs.nightly_rocm720_jobs }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Parse workflow files
|
||||||
|
id: parse
|
||||||
|
run: |
|
||||||
|
# Parse pr-test-amd.yml and extract job names (exclude utility jobs)
|
||||||
|
# Excluded: call-gate, check-changes, pr-test-amd-finish, cancel, check-all-jobs
|
||||||
|
pr_jobs=$(yq -r '.jobs | keys | .[]' .github/workflows/pr-test-amd.yml | \
|
||||||
|
grep -v -E '^(call-gate|check-changes|pr-test-amd-finish|cancel|check-all-jobs)$' | \
|
||||||
|
jq -R -s -c 'split("\n") | map(select(length > 0))')
|
||||||
|
echo "pr_jobs=$pr_jobs" >> $GITHUB_OUTPUT
|
||||||
|
echo "PR jobs: $pr_jobs"
|
||||||
|
|
||||||
|
# Parse nightly-test-amd.yml and extract job names (exclude utility jobs)
|
||||||
|
# Excluded: check-all-jobs
|
||||||
|
nightly_jobs=$(yq -r '.jobs | keys | .[]' .github/workflows/nightly-test-amd.yml | \
|
||||||
|
grep -v -E '^(check-all-jobs)$' | \
|
||||||
|
jq -R -s -c 'split("\n") | map(select(length > 0))')
|
||||||
|
echo "nightly_jobs=$nightly_jobs" >> $GITHUB_OUTPUT
|
||||||
|
echo "Nightly jobs: $nightly_jobs"
|
||||||
|
|
||||||
|
# Parse pr-test-amd-rocm720.yml (exclude utility jobs)
|
||||||
|
# Excluded: call-gate, check-changes, pr-test-amd-finish, cancel, check-all-jobs
|
||||||
|
pr_rocm720_jobs=$(yq -r '.jobs | keys | .[]' .github/workflows/pr-test-amd-rocm720.yml | \
|
||||||
|
grep -v -E '^(call-gate|check-changes|pr-test-amd-finish|cancel|check-all-jobs)$' | \
|
||||||
|
jq -R -s -c 'split("\n") | map(select(length > 0))')
|
||||||
|
echo "pr_rocm720_jobs=$pr_rocm720_jobs" >> $GITHUB_OUTPUT
|
||||||
|
echo "PR ROCm 7.2 jobs: $pr_rocm720_jobs"
|
||||||
|
|
||||||
|
# Parse nightly-test-amd-rocm720.yml (exclude utility jobs)
|
||||||
|
# Excluded: check-all-jobs
|
||||||
|
nightly_rocm720_jobs=$(yq -r '.jobs | keys | .[]' .github/workflows/nightly-test-amd-rocm720.yml | \
|
||||||
|
grep -v -E '^(check-all-jobs)$' | \
|
||||||
|
jq -R -s -c 'split("\n") | map(select(length > 0))')
|
||||||
|
echo "nightly_rocm720_jobs=$nightly_rocm720_jobs" >> $GITHUB_OUTPUT
|
||||||
|
echo "Nightly ROCm 7.2 jobs: $nightly_rocm720_jobs"
|
||||||
|
|
||||||
|
# PR CI reports using dynamic matrix
|
||||||
|
pr-ci-reports:
|
||||||
|
name: PR - ${{ matrix.job_name }}
|
||||||
|
needs: [parse-workflows, fetch-actions-data]
|
||||||
|
if: ${{ !inputs.job_filter }}
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
job_name: ${{ fromJson(needs.parse-workflows.outputs.pr_jobs) }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Download Actions data snapshot
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: ci-data
|
||||||
|
|
||||||
|
- name: Generate Report
|
||||||
|
timeout-minutes: 15
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--job "${{ matrix.job_name }}" \
|
||||||
|
--workflow "pr-test-amd.yml" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--input-data-file ci-data/actions-job-snapshot.json \
|
||||||
|
--summary
|
||||||
|
|
||||||
|
# Nightly AMD test reports using dynamic matrix
|
||||||
|
nightly-reports:
|
||||||
|
name: Nightly - ${{ matrix.job_name }}
|
||||||
|
needs: [parse-workflows, fetch-actions-data]
|
||||||
|
if: ${{ !inputs.job_filter }}
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
job_name: ${{ fromJson(needs.parse-workflows.outputs.nightly_jobs) }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Download Actions data snapshot
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: ci-data
|
||||||
|
|
||||||
|
- name: Generate Nightly Report
|
||||||
|
timeout-minutes: 15
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--job "${{ matrix.job_name }}" \
|
||||||
|
--workflow "nightly-test-amd.yml" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--input-data-file ci-data/actions-job-snapshot.json \
|
||||||
|
--summary
|
||||||
|
|
||||||
|
# PR ROCm 7.2 CI reports using dynamic matrix
|
||||||
|
pr-rocm720-ci-reports:
|
||||||
|
name: PR ROCm720 - ${{ matrix.job_name }}
|
||||||
|
needs: [parse-workflows, fetch-actions-data]
|
||||||
|
if: ${{ !inputs.job_filter }}
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
job_name: ${{ fromJson(needs.parse-workflows.outputs.pr_rocm720_jobs) }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Download Actions data snapshot
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: ci-data
|
||||||
|
|
||||||
|
- name: Generate PR ROCm 7.2 Report
|
||||||
|
timeout-minutes: 15
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--job "${{ matrix.job_name }}" \
|
||||||
|
--workflow "pr-test-amd-rocm720.yml" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--input-data-file ci-data/actions-job-snapshot.json \
|
||||||
|
--summary
|
||||||
|
|
||||||
|
# Nightly ROCm 7.2 reports using dynamic matrix
|
||||||
|
nightly-rocm720-reports:
|
||||||
|
name: Nightly ROCm720 - ${{ matrix.job_name }}
|
||||||
|
needs: [parse-workflows, fetch-actions-data]
|
||||||
|
if: ${{ !inputs.job_filter }}
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
job_name: ${{ fromJson(needs.parse-workflows.outputs.nightly_rocm720_jobs) }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Download Actions data snapshot
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: ci-data
|
||||||
|
|
||||||
|
- name: Generate Nightly ROCm 7.2 Report
|
||||||
|
timeout-minutes: 15
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--job "${{ matrix.job_name }}" \
|
||||||
|
--workflow "nightly-test-amd-rocm720.yml" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--input-data-file ci-data/actions-job-snapshot.json \
|
||||||
|
--summary
|
||||||
|
|
||||||
|
# Runner fleet report - cross-workflow runner analytics in a single pass
|
||||||
|
runner-fleet-report:
|
||||||
|
name: Runner Fleet Report
|
||||||
|
if: ${{ !inputs.job_filter }}
|
||||||
|
needs: fetch-actions-data
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install tabulate
|
||||||
|
|
||||||
|
- name: Download Actions data snapshot
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: actions-job-snapshot
|
||||||
|
path: ci-data
|
||||||
|
|
||||||
|
- name: Generate Runner Fleet Report
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/query_job_status.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--runner-report \
|
||||||
|
--workflow "pr-test-amd.yml,nightly-test-amd.yml,pr-test-amd-rocm720.yml,nightly-test-amd-rocm720.yml" \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
--input-data-file ci-data/actions-job-snapshot.json \
|
||||||
|
--summary
|
||||||
10
third_party/sglang/.github/workflows/auto-tune.yml
vendored
Normal file
10
third_party/sglang/.github/workflows/auto-tune.yml
vendored
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
name: Auto tune
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
auto-tune-lint:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
50
third_party/sglang/.github/workflows/bot-bump-flashinfer-version.yml
vendored
Normal file
50
third_party/sglang/.github/workflows/bot-bump-flashinfer-version.yml
vendored
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
name: Bot Bump Flashinfer Version
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
new_version:
|
||||||
|
description: 'New flashinfer version (e.g., 0.6.4)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
bump-flashinfer-version:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install Python dependencies
|
||||||
|
run: |
|
||||||
|
pip install tomli
|
||||||
|
|
||||||
|
- name: Configure Git and branch
|
||||||
|
run: |
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
RANDOM_SUFFIX=$(echo $RANDOM | md5sum | head -c 4)
|
||||||
|
BRANCH_NAME="bot/bump-flashinfer-version-${{ github.event.inputs.new_version }}-${RANDOM_SUFFIX}"
|
||||||
|
git checkout -b "$BRANCH_NAME"
|
||||||
|
echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Run flashinfer version bump script
|
||||||
|
run: |
|
||||||
|
python scripts/release/bump_flashinfer_version.py "${{ github.event.inputs.new_version }}"
|
||||||
|
|
||||||
|
- name: Commit and create PR
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
|
||||||
|
run: |
|
||||||
|
bash scripts/release/commit_and_pr.sh "flashinfer" "${{ github.event.inputs.new_version }}" "$BRANCH_NAME"
|
||||||
60
third_party/sglang/.github/workflows/bot-bump-kernel-version-to-sglang.yml
vendored
Normal file
60
third_party/sglang/.github/workflows/bot-bump-kernel-version-to-sglang.yml
vendored
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
name: Bot Bump Kernel Version to SGLang
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
bump-kernel-version-to-sglang:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
branch_name: ${{ steps.set_output.outputs.branch_name }}
|
||||||
|
needs_sync: ${{ steps.check_sync.outputs.needs_sync }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install Python dependencies
|
||||||
|
run: |
|
||||||
|
pip install tomli
|
||||||
|
|
||||||
|
- name: Check if sync is needed
|
||||||
|
id: check_sync
|
||||||
|
run: |
|
||||||
|
python scripts/release/check_kernel_version_to_sglang.py
|
||||||
|
|
||||||
|
- name: Configure Git and branch
|
||||||
|
if: steps.check_sync.outputs.needs_sync == 'true'
|
||||||
|
id: set_output
|
||||||
|
run: |
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
RANDOM_SUFFIX=$(echo $RANDOM | md5sum | head -c 4)
|
||||||
|
KERNEL_VERSION="${{ steps.check_sync.outputs.kernel_version }}"
|
||||||
|
BRANCH_NAME="bot/bump-kernel-version-to-sglang-${KERNEL_VERSION}-${RANDOM_SUFFIX}"
|
||||||
|
git checkout -b "$BRANCH_NAME"
|
||||||
|
echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
|
||||||
|
echo "KERNEL_VERSION=$KERNEL_VERSION" >> $GITHUB_ENV
|
||||||
|
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Run kernel version bump script
|
||||||
|
if: steps.check_sync.outputs.needs_sync == 'true'
|
||||||
|
run: |
|
||||||
|
python scripts/release/bump_kernel_version_to_sglang.py
|
||||||
|
|
||||||
|
- name: Commit and create PR
|
||||||
|
if: steps.check_sync.outputs.needs_sync == 'true'
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
|
||||||
|
run: |
|
||||||
|
bash scripts/release/commit_and_pr_kernel_to_sglang.sh "$KERNEL_VERSION" "$BRANCH_NAME"
|
||||||
50
third_party/sglang/.github/workflows/bot-bump-kernel-version.yml
vendored
Normal file
50
third_party/sglang/.github/workflows/bot-bump-kernel-version.yml
vendored
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
name: Bot Bump Kernel Version
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
new_version:
|
||||||
|
description: 'New sgl-kernel version (e.g., 0.3.12)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
bump-kernel-version:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install Python dependencies
|
||||||
|
run: |
|
||||||
|
pip install tomli
|
||||||
|
|
||||||
|
- name: Configure Git and branch
|
||||||
|
run: |
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
RANDOM_SUFFIX=$(echo $RANDOM | md5sum | head -c 4)
|
||||||
|
BRANCH_NAME="bot/bump-kernel-version-${{ github.event.inputs.new_version }}-${RANDOM_SUFFIX}"
|
||||||
|
git checkout -b "$BRANCH_NAME"
|
||||||
|
echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Run kernel version bump script
|
||||||
|
run: |
|
||||||
|
python scripts/release/bump_kernel_version.py "${{ github.event.inputs.new_version }}"
|
||||||
|
|
||||||
|
- name: Commit and create PR
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
|
||||||
|
run: |
|
||||||
|
bash scripts/release/commit_and_pr.sh "sgl-kernel" "${{ github.event.inputs.new_version }}" "$BRANCH_NAME"
|
||||||
89
third_party/sglang/.github/workflows/bot-bump-sglang-version.yml
vendored
Normal file
89
third_party/sglang/.github/workflows/bot-bump-sglang-version.yml
vendored
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
name: Bot Bump SGLang Version
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
new_version:
|
||||||
|
description: 'New SGLang version (e.g., 0.5.3 or 0.5.3rc0)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
bump-sglang-version:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
branch_name: ${{ steps.set_output.outputs.branch_name }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install Python dependencies
|
||||||
|
run: |
|
||||||
|
pip install tomli
|
||||||
|
|
||||||
|
- name: Configure Git and branch
|
||||||
|
id: set_output
|
||||||
|
run: |
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
RANDOM_SUFFIX=$(echo $RANDOM | md5sum | head -c 4)
|
||||||
|
BRANCH_NAME="bot/bump-sglang-version-${{ github.event.inputs.new_version }}-${RANDOM_SUFFIX}"
|
||||||
|
git checkout -b "$BRANCH_NAME"
|
||||||
|
echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
|
||||||
|
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Run SGLang version bump script
|
||||||
|
run: |
|
||||||
|
python scripts/release/bump_sglang_version.py "${{ github.event.inputs.new_version }}"
|
||||||
|
|
||||||
|
- name: Commit and create PR
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
|
||||||
|
run: |
|
||||||
|
bash scripts/release/commit_and_pr.sh "SGLang" "${{ github.event.inputs.new_version }}" "$BRANCH_NAME"
|
||||||
|
|
||||||
|
run-nightly-tests-nvidia:
|
||||||
|
needs: bump-sglang-version
|
||||||
|
uses: ./.github/workflows/nightly-test-nvidia.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.bump-sglang-version.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-nightly-tests-amd:
|
||||||
|
needs: bump-sglang-version
|
||||||
|
uses: ./.github/workflows/nightly-test-amd.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.bump-sglang-version.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-nightly-tests-npu:
|
||||||
|
needs: bump-sglang-version
|
||||||
|
uses: ./.github/workflows/nightly-test-npu.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.bump-sglang-version.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-pr-tests-xeon:
|
||||||
|
needs: bump-sglang-version
|
||||||
|
uses: ./.github/workflows/pr-test-xeon.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.bump-sglang-version.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-pr-tests-xpu:
|
||||||
|
needs: bump-sglang-version
|
||||||
|
uses: ./.github/workflows/pr-test-xpu.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.bump-sglang-version.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
182
third_party/sglang/.github/workflows/bot-cherry-pick.yml
vendored
Normal file
182
third_party/sglang/.github/workflows/bot-cherry-pick.yml
vendored
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
name: Bot Cherry Pick to Release Branch
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
commit_sha:
|
||||||
|
description: 'Commit SHA to cherry-pick (full or short hash)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
target_branch:
|
||||||
|
description: 'Target release branch (e.g., release/v0.5.7)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
create_pr:
|
||||||
|
description: 'Create a PR instead of pushing directly'
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: true
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: cherry-pick-${{ github.event.inputs.target_branch }}
|
||||||
|
cancel-in-progress: false
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
cherry-pick:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: 'prod'
|
||||||
|
steps:
|
||||||
|
- name: Validate inputs
|
||||||
|
env:
|
||||||
|
TARGET_BRANCH: ${{ github.event.inputs.target_branch }}
|
||||||
|
run: |
|
||||||
|
if [[ ! "$TARGET_BRANCH" =~ ^release/v[0-9]+\.[0-9]+(\.[0-9]+)?$ ]]; then
|
||||||
|
echo "::error::Target branch must match pattern 'release/vX.Y' or 'release/vX.Y.Z' (e.g., release/v0.5.7)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
|
||||||
|
|
||||||
|
- name: Configure Git
|
||||||
|
run: |
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
|
||||||
|
- name: Validate target branch exists
|
||||||
|
env:
|
||||||
|
TARGET_BRANCH: ${{ github.event.inputs.target_branch }}
|
||||||
|
run: |
|
||||||
|
git fetch origin
|
||||||
|
if ! git ls-remote --exit-code --heads origin "$TARGET_BRANCH" > /dev/null 2>&1; then
|
||||||
|
echo "::error::Target branch '$TARGET_BRANCH' does not exist on remote"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Get commit info
|
||||||
|
id: commit_info
|
||||||
|
env:
|
||||||
|
COMMIT_SHA_INPUT: ${{ github.event.inputs.commit_sha }}
|
||||||
|
run: |
|
||||||
|
# Verify commit exists
|
||||||
|
if ! git cat-file -t "$COMMIT_SHA_INPUT" > /dev/null 2>&1; then
|
||||||
|
echo "::error::Commit SHA '$COMMIT_SHA_INPUT' does not exist"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get full SHA if short hash provided
|
||||||
|
FULL_SHA=$(git rev-parse "$COMMIT_SHA_INPUT")
|
||||||
|
COMMIT_TITLE=$(git log -1 --format="%s" "$FULL_SHA")
|
||||||
|
SHORT_SHA=$(git rev-parse --short "$FULL_SHA")
|
||||||
|
echo "full_sha=$FULL_SHA" >> $GITHUB_OUTPUT
|
||||||
|
echo "short_sha=$SHORT_SHA" >> $GITHUB_OUTPUT
|
||||||
|
# Use delimiter for multiline-safe output
|
||||||
|
{
|
||||||
|
echo "commit_title<<EOF"
|
||||||
|
echo "$COMMIT_TITLE"
|
||||||
|
echo "EOF"
|
||||||
|
} >> $GITHUB_OUTPUT
|
||||||
|
echo "Cherry-picking commit: $SHORT_SHA - $COMMIT_TITLE"
|
||||||
|
|
||||||
|
- name: Cherry-pick commit
|
||||||
|
id: cherry_pick
|
||||||
|
env:
|
||||||
|
TARGET_BRANCH: ${{ github.event.inputs.target_branch }}
|
||||||
|
FULL_SHA: ${{ steps.commit_info.outputs.full_sha }}
|
||||||
|
SHORT_SHA: ${{ steps.commit_info.outputs.short_sha }}
|
||||||
|
CREATE_PR: ${{ github.event.inputs.create_pr }}
|
||||||
|
run: |
|
||||||
|
if [[ "$CREATE_PR" == "true" ]]; then
|
||||||
|
# Create a new branch for the PR
|
||||||
|
RANDOM_SUFFIX=$(head -c 4 /dev/urandom | xxd -p)
|
||||||
|
NEW_BRANCH="cherry-pick/${SHORT_SHA}-to-${TARGET_BRANCH#release/}-${RANDOM_SUFFIX}"
|
||||||
|
git checkout -b "$NEW_BRANCH" "origin/$TARGET_BRANCH"
|
||||||
|
echo "new_branch=$NEW_BRANCH" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
# Checkout target branch directly
|
||||||
|
git checkout "$TARGET_BRANCH"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Attempt cherry-pick
|
||||||
|
if git cherry-pick "$FULL_SHA"; then
|
||||||
|
echo "cherry_pick_success=true" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "::error::Cherry-pick failed due to conflicts. Please resolve manually."
|
||||||
|
git cherry-pick --abort || true
|
||||||
|
echo "cherry_pick_success=false" >> $GITHUB_OUTPUT
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Push changes
|
||||||
|
if: steps.cherry_pick.outputs.cherry_pick_success == 'true'
|
||||||
|
env:
|
||||||
|
CREATE_PR: ${{ github.event.inputs.create_pr }}
|
||||||
|
TARGET_BRANCH: ${{ github.event.inputs.target_branch }}
|
||||||
|
NEW_BRANCH: ${{ steps.cherry_pick.outputs.new_branch }}
|
||||||
|
run: |
|
||||||
|
if [[ "$CREATE_PR" == "true" ]]; then
|
||||||
|
git push origin "$NEW_BRANCH"
|
||||||
|
else
|
||||||
|
git push origin "$TARGET_BRANCH"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Create Pull Request
|
||||||
|
if: steps.cherry_pick.outputs.cherry_pick_success == 'true' && github.event.inputs.create_pr == 'true'
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
|
||||||
|
TARGET_BRANCH: ${{ github.event.inputs.target_branch }}
|
||||||
|
SHORT_SHA: ${{ steps.commit_info.outputs.short_sha }}
|
||||||
|
COMMIT_TITLE: ${{ steps.commit_info.outputs.commit_title }}
|
||||||
|
FULL_SHA: ${{ steps.commit_info.outputs.full_sha }}
|
||||||
|
NEW_BRANCH: ${{ steps.cherry_pick.outputs.new_branch }}
|
||||||
|
run: |
|
||||||
|
PR_TITLE="[Cherry-pick] ${COMMIT_TITLE} to ${TARGET_BRANCH}"
|
||||||
|
|
||||||
|
gh pr create \
|
||||||
|
--title "$PR_TITLE" \
|
||||||
|
--base "$TARGET_BRANCH" \
|
||||||
|
--head "$NEW_BRANCH" \
|
||||||
|
--label "cherry-pick" \
|
||||||
|
--body-file - <<EOF
|
||||||
|
Cherry-pick of commit ${FULL_SHA} to \`${TARGET_BRANCH}\`
|
||||||
|
|
||||||
|
**Original commit:** ${FULL_SHA}
|
||||||
|
**Original title:** ${COMMIT_TITLE}
|
||||||
|
|
||||||
|
---
|
||||||
|
*This PR was automatically created by the cherry-pick workflow.*
|
||||||
|
EOF
|
||||||
|
|
||||||
|
- name: Summary
|
||||||
|
if: always()
|
||||||
|
env:
|
||||||
|
FULL_SHA: ${{ steps.commit_info.outputs.full_sha }}
|
||||||
|
COMMIT_TITLE: ${{ steps.commit_info.outputs.commit_title }}
|
||||||
|
TARGET_BRANCH: ${{ github.event.inputs.target_branch }}
|
||||||
|
CHERRY_PICK_SUCCESS: ${{ steps.cherry_pick.outputs.cherry_pick_success }}
|
||||||
|
CREATE_PR: ${{ github.event.inputs.create_pr }}
|
||||||
|
NEW_BRANCH: ${{ steps.cherry_pick.outputs.new_branch }}
|
||||||
|
ACTOR: ${{ github.actor }}
|
||||||
|
run: |
|
||||||
|
echo "## Cherry-Pick Summary" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Triggered by:** @${ACTOR}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Commit:** ${FULL_SHA}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Title:** ${COMMIT_TITLE}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "- **Target Branch:** ${TARGET_BRANCH}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
if [[ "$CHERRY_PICK_SUCCESS" == "true" ]]; then
|
||||||
|
echo "- **Status:** ✅ Success" >> $GITHUB_STEP_SUMMARY
|
||||||
|
else
|
||||||
|
echo "- **Status:** ❌ Failed" >> $GITHUB_STEP_SUMMARY
|
||||||
|
fi
|
||||||
|
if [[ "$CREATE_PR" == "true" && "$CHERRY_PICK_SUCCESS" == "true" ]]; then
|
||||||
|
echo "- **PR Branch:** ${NEW_BRANCH}" >> $GITHUB_STEP_SUMMARY
|
||||||
|
fi
|
||||||
22
third_party/sglang/.github/workflows/cancel-pr-workflow-on-merge.yml
vendored
Normal file
22
third_party/sglang/.github/workflows/cancel-pr-workflow-on-merge.yml
vendored
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
name: Cancel PR Workflows on Merge
|
||||||
|
|
||||||
|
on:
|
||||||
|
pull_request_target:
|
||||||
|
types:
|
||||||
|
- closed
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
actions: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
cancel:
|
||||||
|
if: github.event.pull_request.merged == true
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Cancel Previous Runs
|
||||||
|
uses: styfle/cancel-workflow-action@0.12.1
|
||||||
|
with:
|
||||||
|
workflow_id: all
|
||||||
|
access_token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
ignore_sha: true
|
||||||
|
pr_number: ${{ github.event.pull_request.number }}
|
||||||
155
third_party/sglang/.github/workflows/cancel-unfinished-pr-tests.yml
vendored
Normal file
155
third_party/sglang/.github/workflows/cancel-unfinished-pr-tests.yml
vendored
Normal file
@@ -0,0 +1,155 @@
|
|||||||
|
name: Cancel Unfinished PR Runs
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
workflows:
|
||||||
|
description: 'Space-separated list of workflow filenames to cancel'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
default: 'pr-test.yml'
|
||||||
|
include_high_priority:
|
||||||
|
description: 'Also cancel runs from high-priority PRs'
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: false
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
actions: write # Needed to cancel runs
|
||||||
|
contents: read # Needed to read repo info
|
||||||
|
pull-requests: read # needed for gh pr view (labels)
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
cancel-unfinished-pr-runs:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Install GitHub CLI
|
||||||
|
run: sudo apt-get install -y gh jq
|
||||||
|
|
||||||
|
- name: Cancel unfinished PR-associated runs (skip high-priority PRs)
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
REPO: ${{ github.repository }}
|
||||||
|
WORKFLOWS: ${{ github.event.inputs.workflows || 'pr-test.yml' }}
|
||||||
|
INCLUDE_HIGH_PRIORITY: ${{ github.event.inputs.include_high_priority || 'false' }}
|
||||||
|
shell: bash
|
||||||
|
run: |
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Read the space-separated string from the input into a bash array
|
||||||
|
read -r -a WORKFLOW_FILES <<< "${WORKFLOWS}"
|
||||||
|
|
||||||
|
echo "Targeting ${#WORKFLOW_FILES[@]} workflow(s): ${WORKFLOWS}"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
for workflow_file in "${WORKFLOW_FILES[@]}"; do
|
||||||
|
echo "========================================="
|
||||||
|
echo "Workflow: $workflow_file"
|
||||||
|
echo "========================================="
|
||||||
|
|
||||||
|
# Get all unfinished runs
|
||||||
|
all_runs=$(gh run list \
|
||||||
|
--repo "$REPO" \
|
||||||
|
--workflow "$workflow_file" \
|
||||||
|
--json databaseId,status,event,url,createdAt \
|
||||||
|
--limit 1000 \
|
||||||
|
| jq -c '.[] | select(.status=="queued" or .status=="waiting" or .status=="in_progress")')
|
||||||
|
|
||||||
|
if [ -z "$all_runs" ]; then
|
||||||
|
echo "✅ No unfinished runs found"
|
||||||
|
echo ""
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Count runs by event type
|
||||||
|
total_runs=$(echo "$all_runs" | wc -l)
|
||||||
|
pr_runs=$(echo "$all_runs" | jq -s '[.[] | select(.event=="pull_request")] | length')
|
||||||
|
other_runs=$(echo "$all_runs" | jq -s '[.[] | select(.event!="pull_request")] | length')
|
||||||
|
|
||||||
|
echo "📊 Summary: $total_runs unfinished runs ($pr_runs PR-related, $other_runs other)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Process non-PR runs first
|
||||||
|
if [ "$other_runs" -gt 0 ]; then
|
||||||
|
echo "--- Non-PR Runs ---"
|
||||||
|
echo "$all_runs" | jq -c 'select(.event!="pull_request")' | while read -r run; do
|
||||||
|
run_url=$(echo "$run" | jq -r '.url')
|
||||||
|
run_event=$(echo "$run" | jq -r '.event')
|
||||||
|
run_status=$(echo "$run" | jq -r '.status')
|
||||||
|
echo " • $run_event ($run_status): $run_url"
|
||||||
|
done
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Process PR runs
|
||||||
|
if [ "$pr_runs" -gt 0 ]; then
|
||||||
|
echo "--- PR Runs (checking for cancellation) ---"
|
||||||
|
echo "$all_runs" | jq -c 'select(.event=="pull_request")' | while read -r run; do
|
||||||
|
run_id=$(echo "$run" | jq -r '.databaseId')
|
||||||
|
run_url=$(echo "$run" | jq -r '.url')
|
||||||
|
run_status=$(echo "$run" | jq -r '.status')
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "Run ($run_status): $run_url"
|
||||||
|
|
||||||
|
# Fetch full run details to get head repository and branch info
|
||||||
|
run_details=$(gh api -H "Accept: application/vnd.github+json" \
|
||||||
|
"repos/$REPO/actions/runs/$run_id" 2>/dev/null || true)
|
||||||
|
|
||||||
|
if [ -z "$run_details" ]; then
|
||||||
|
echo " ⚠️ Could not fetch run details, skipping"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get head owner and branch (works for both fork and non-fork PRs)
|
||||||
|
head_owner=$(echo "$run_details" | jq -r '.head_repository.owner.login // empty')
|
||||||
|
head_branch=$(echo "$run_details" | jq -r '.head_branch // empty')
|
||||||
|
|
||||||
|
if [ -z "$head_owner" ] || [ -z "$head_branch" ]; then
|
||||||
|
echo " ⚠️ Missing head info, skipping"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " Branch: ${head_owner}:${head_branch}"
|
||||||
|
|
||||||
|
# Find PR by searching with head=owner:branch
|
||||||
|
pr_number=$(gh api -H "Accept: application/vnd.github+json" \
|
||||||
|
"repos/$REPO/pulls?state=open&head=${head_owner}:${head_branch}" \
|
||||||
|
--jq '.[0].number // empty' 2>/dev/null || true)
|
||||||
|
|
||||||
|
if [ -z "$pr_number" ]; then
|
||||||
|
echo " ⚠️ No open PR found, skipping"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
pr_url="https://github.com/$REPO/pull/$pr_number"
|
||||||
|
echo " PR: $pr_url"
|
||||||
|
|
||||||
|
# Check for high priority label
|
||||||
|
labels=$(gh pr view "$pr_number" --repo "$REPO" --json labels \
|
||||||
|
| jq -r '.labels[].name' 2>/dev/null || true)
|
||||||
|
|
||||||
|
if echo "$labels" | grep -Fxq "bypass-maintenance"; then
|
||||||
|
echo " 🛑 Skipping (bypass-maintenance label, never cancelled)"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
if echo "$labels" | grep -Fxq "high priority"; then
|
||||||
|
if [ "$INCLUDE_HIGH_PRIORITY" != "true" ]; then
|
||||||
|
echo " 🛑 Skipping (high priority label)"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
echo " ⚠️ High priority PR, but include_high_priority is enabled"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " 🚫 Cancelling..."
|
||||||
|
gh run cancel "$run_id" --repo "$REPO" || echo " ⚠️ Cancellation failed"
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "========================================="
|
||||||
|
echo "✅ Processing complete"
|
||||||
|
echo "========================================="
|
||||||
154
third_party/sglang/.github/workflows/ci-coverage-overview.yml
vendored
Normal file
154
third_party/sglang/.github/workflows/ci-coverage-overview.yml
vendored
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
name: CI Coverage Overview
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 6 * * *' # Daily at 6 AM UTC
|
||||||
|
pull_request:
|
||||||
|
paths:
|
||||||
|
- '.github/workflows/ci-coverage-overview.yml'
|
||||||
|
- 'scripts/ci/utils/ci_coverage_report.py'
|
||||||
|
- 'test/registered/**'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
output_format:
|
||||||
|
description: 'Output format'
|
||||||
|
required: false
|
||||||
|
default: 'markdown'
|
||||||
|
type: choice
|
||||||
|
options:
|
||||||
|
- markdown
|
||||||
|
- json
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
summary:
|
||||||
|
name: Summary
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Generate Summary Report
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/ci_coverage_report.py --section summary
|
||||||
|
|
||||||
|
by-folder:
|
||||||
|
name: Tests by Folder
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Generate Tests by Folder Report
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/ci_coverage_report.py --section by-folder
|
||||||
|
|
||||||
|
by-suite:
|
||||||
|
name: Tests by Suite
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Generate Tests by Suite Report
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/ci_coverage_report.py --section by-suite
|
||||||
|
|
||||||
|
unit-test-coverage:
|
||||||
|
name: Unit Test Code Coverage
|
||||||
|
if: github.event_name != 'pull_request'
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 30
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 10
|
||||||
|
run: |
|
||||||
|
pip install -e "python/[test]"
|
||||||
|
|
||||||
|
- name: Run unit tests with coverage
|
||||||
|
timeout-minutes: 10
|
||||||
|
run: |
|
||||||
|
pytest test/registered/unit/ \
|
||||||
|
--cov --cov-config=.coveragerc \
|
||||||
|
--cov-report=term-missing:skip-covered \
|
||||||
|
--continue-on-collection-errors \
|
||||||
|
-v | tee coverage_output.txt
|
||||||
|
|
||||||
|
- name: Write coverage to summary
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
echo "## Unit Test Code Coverage" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "**Commit:** \`${GITHUB_SHA::8}\` | **Branch:** \`${GITHUB_REF_NAME}\`" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
# Test result line (e.g., "== 42 passed, 1 failed in 23.5s ==")
|
||||||
|
echo '```' >> $GITHUB_STEP_SUMMARY
|
||||||
|
grep -E '^=+.*passed' coverage_output.txt >> $GITHUB_STEP_SUMMARY || true
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
# Coverage total
|
||||||
|
grep -E '^TOTAL ' coverage_output.txt >> $GITHUB_STEP_SUMMARY || true
|
||||||
|
echo '```' >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
# Partially covered core modules (1-49%) — most actionable for contributors
|
||||||
|
# Only show modules with testable logic; skip configs, models, layers, etc.
|
||||||
|
LOW_COV=$(awk '/^python\/.*%/ {
|
||||||
|
for (i=1; i<=NF; i++) {
|
||||||
|
if ($i ~ /^[0-9]+%$/) {
|
||||||
|
pct = $i + 0
|
||||||
|
if (pct >= 1 && pct < 50) printf "%-80s %5s %s\n", $1, $(i-2), $i
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}' coverage_output.txt \
|
||||||
|
| grep -E '/(mem_cache|managers|sampling|parser|observability|function_call|entrypoints|speculative|multimodal|utils)/' \
|
||||||
|
| head -40 || true)
|
||||||
|
if [ -n "$LOW_COV" ]; then
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "<details><summary>Core modules with coverage below 50% — good candidates for more unit tests</summary>" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo '```' >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "$LOW_COV" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo '```' >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "</details>" >> $GITHUB_STEP_SUMMARY
|
||||||
|
fi
|
||||||
|
|
||||||
|
json-export:
|
||||||
|
name: JSON Export
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
if: inputs.output_format == 'json'
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Generate JSON Report
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/ci_coverage_report.py --output-format json > ci_coverage.json
|
||||||
|
|
||||||
|
- name: Upload JSON artifact
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: ci-coverage-report
|
||||||
|
path: ci_coverage.json
|
||||||
72
third_party/sglang/.github/workflows/ci-failure-monitor.yml
vendored
Normal file
72
third_party/sglang/.github/workflows/ci-failure-monitor.yml
vendored
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
name: CI Failure Monitor
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 */12 * * *' # Every 12 hour
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: ci-failure-monitor-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
actions: read
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
failure-analysis:
|
||||||
|
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.14'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m pip install --upgrade pip
|
||||||
|
pip install requests slack_sdk
|
||||||
|
|
||||||
|
- name: Run Failure Analysis
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
GH_PAT_FOR_RUNNER_ADMIN: ${{ secrets.GH_PAT_FOR_RUNNER_ADMIN }}
|
||||||
|
PYTHONUNBUFFERED: 1
|
||||||
|
PYTHONIOENCODING: utf-8
|
||||||
|
run: |
|
||||||
|
cd scripts/ci_monitor
|
||||||
|
python ci_failures_analysis.py \
|
||||||
|
--token $GITHUB_TOKEN \
|
||||||
|
--limit 100 \
|
||||||
|
--output ci_failure_analysis_$(date +%Y%m%d_%H%M%S).json
|
||||||
|
|
||||||
|
- name: Upload Analysis Results
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: ci-failure-analysis-${{ github.run_number }}
|
||||||
|
path: |
|
||||||
|
scripts/ci_monitor/ci_failure_analysis_*.json
|
||||||
|
retention-days: 7
|
||||||
|
|
||||||
|
- name: Send Slack Notification
|
||||||
|
if: always()
|
||||||
|
env:
|
||||||
|
SGLANG_DIFFUSION_SLACK_TOKEN: ${{ secrets.SGLANG_DIFFUSION_SLACK_TOKEN }}
|
||||||
|
run: |
|
||||||
|
cd scripts/ci_monitor
|
||||||
|
LATEST_REPORT=$(ls -t ci_failure_analysis_*.json | head -1)
|
||||||
|
|
||||||
|
if [ ! -f "$LATEST_REPORT" ]; then
|
||||||
|
echo "No report found, so skipping Slack notification"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -n "$SGLANG_DIFFUSION_SLACK_TOKEN" ]; then
|
||||||
|
python3 post_ci_failures_to_slack.py --report-file "$LATEST_REPORT"
|
||||||
|
else
|
||||||
|
echo "SGLANG_DIFFUSION_SLACK_TOKEN not configured, skipping notification"
|
||||||
|
fi
|
||||||
96
third_party/sglang/.github/workflows/close-inactive-issues.yml
vendored
Normal file
96
third_party/sglang/.github/workflows/close-inactive-issues.yml
vendored
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
name: Close Inactive Issues
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 0 * * *'
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
issues: write
|
||||||
|
contents: read
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
close-inactive-issues:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Check and close inactive issues
|
||||||
|
uses: actions/github-script@v6
|
||||||
|
with:
|
||||||
|
github-token: ${{secrets.GITHUB_TOKEN}}
|
||||||
|
script: |
|
||||||
|
const sixtyDaysAgo = new Date(Date.now() - 60 * 24 * 60 * 60 * 1000);
|
||||||
|
|
||||||
|
const [owner, repo] = process.env.GITHUB_REPOSITORY.split('/');
|
||||||
|
console.log(`Owner: ${owner}, Repo: ${repo}`);
|
||||||
|
|
||||||
|
async function fetchIssues(page = 1) {
|
||||||
|
console.log(`Fetching issues for ${owner}/${repo}, page ${page}`);
|
||||||
|
return await github.rest.issues.listForRepo({
|
||||||
|
owner,
|
||||||
|
repo,
|
||||||
|
state: 'open',
|
||||||
|
sort: 'updated',
|
||||||
|
direction: 'asc',
|
||||||
|
per_page: 100,
|
||||||
|
page: page
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function processIssues() {
|
||||||
|
console.log('Starting to process issues');
|
||||||
|
console.log(`Repository: ${owner}/${repo}`);
|
||||||
|
|
||||||
|
let page = 1;
|
||||||
|
let hasMoreIssues = true;
|
||||||
|
while (hasMoreIssues) {
|
||||||
|
try {
|
||||||
|
const issues = await fetchIssues(page);
|
||||||
|
console.log(`Fetched ${issues.data.length} issues on page ${page}`);
|
||||||
|
|
||||||
|
if (issues.data.length === 0) {
|
||||||
|
hasMoreIssues = false;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const issue of issues.data) {
|
||||||
|
// Skip if the issue has 'good first issue' label
|
||||||
|
if (issue.labels.some(label => label.name === 'good first issue')) {
|
||||||
|
console.log(`Skipping issue #${issue.number} as it's marked as 'good first issue'`);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (new Date(issue.updated_at) < sixtyDaysAgo) {
|
||||||
|
try {
|
||||||
|
await github.rest.issues.update({
|
||||||
|
owner,
|
||||||
|
repo,
|
||||||
|
issue_number: issue.number,
|
||||||
|
state: 'closed',
|
||||||
|
labels: [...issue.labels.map(l => l.name), 'inactive']
|
||||||
|
});
|
||||||
|
await github.rest.issues.createComment({
|
||||||
|
owner,
|
||||||
|
repo,
|
||||||
|
issue_number: issue.number,
|
||||||
|
body: 'This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.'
|
||||||
|
});
|
||||||
|
console.log(`Closed issue #${issue.number} due to inactivity.`);
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Failed to close issue #${issue.number}: ${error.message}`);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
console.log(`Issue #${issue.number} is still active. Stopping processing.`);
|
||||||
|
hasMoreIssues = false;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
page += 1;
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Error fetching issues on page ${page}: ${error.message}`);
|
||||||
|
hasMoreIssues = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
console.log('Finished processing issues');
|
||||||
|
}
|
||||||
|
|
||||||
|
await processIssues();
|
||||||
115
third_party/sglang/.github/workflows/diffusion-ci-gt-gen.yml
vendored
Normal file
115
third_party/sglang/.github/workflows/diffusion-ci-gt-gen.yml
vendored
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
name: Diffusion CI Ground Truth Generation
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref to checkout'
|
||||||
|
required: false
|
||||||
|
default: ''
|
||||||
|
type: string
|
||||||
|
case_ids:
|
||||||
|
description: 'Specific case IDs to run (space-separated, optional)'
|
||||||
|
required: false
|
||||||
|
default: ''
|
||||||
|
type: string
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: diffusion-ci-gt-gen-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
actions: read
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
multimodal-diffusion-gen-1gpu:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
timeout-minutes: 150
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Generate outputs
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python -m sglang.multimodal_gen.test.scripts.gen_diffusion_ci_outputs \
|
||||||
|
--suite 1-gpu \
|
||||||
|
--partition-id ${{ matrix.part }} \
|
||||||
|
--total-partitions 2 \
|
||||||
|
--out-dir ./diffusion-ci-outputs \
|
||||||
|
${{ inputs.case_ids != '' && format('--case-ids {0}', inputs.case_ids) || '' }}
|
||||||
|
|
||||||
|
- name: Upload artifact
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: diffusion-gen-1gpu-part${{ matrix.part }}
|
||||||
|
path: python/diffusion-ci-outputs
|
||||||
|
retention-days: 7
|
||||||
|
|
||||||
|
multimodal-diffusion-gen-2gpu:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
timeout-minutes: 150
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Generate outputs
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python -m sglang.multimodal_gen.test.scripts.gen_diffusion_ci_outputs \
|
||||||
|
--suite 2-gpu \
|
||||||
|
--partition-id ${{ matrix.part }} \
|
||||||
|
--total-partitions 2 \
|
||||||
|
--out-dir ./diffusion-ci-outputs \
|
||||||
|
${{ inputs.case_ids != '' && format('--case-ids {0}', inputs.case_ids) || '' }}
|
||||||
|
|
||||||
|
- name: Upload artifact
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: diffusion-gen-2gpu-part${{ matrix.part }}
|
||||||
|
path: python/diffusion-ci-outputs
|
||||||
|
retention-days: 7
|
||||||
|
|
||||||
|
diffusion-ci-push:
|
||||||
|
needs: [multimodal-diffusion-gen-1gpu, multimodal-diffusion-gen-2gpu]
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
pattern: diffusion-gen-*
|
||||||
|
path: combined
|
||||||
|
merge-multiple: true
|
||||||
|
|
||||||
|
- name: Collect image files
|
||||||
|
run: |
|
||||||
|
mkdir -p gt_images
|
||||||
|
find combined \( -name "*.png" -o -name "*.jpg" -o -name "*.jpeg" -o -name "*.webp" \) -type f -exec cp -f {} gt_images/ \;
|
||||||
|
|
||||||
|
- name: Publish GT images to sglang-bot/sglang-ci-data
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
run: python scripts/ci/utils/diffusion/publish_diffusion_gt.py --source-dir gt_images
|
||||||
74
third_party/sglang/.github/workflows/execute-notebook.yml
vendored
Normal file
74
third_party/sglang/.github/workflows/execute-notebook.yml
vendored
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
name: Execute Notebooks
|
||||||
|
|
||||||
|
on:
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
types: [opened, synchronize, reopened, labeled]
|
||||||
|
paths:
|
||||||
|
- "python/sglang/**"
|
||||||
|
- "docs/**"
|
||||||
|
- "!python/sglang/**/*.md"
|
||||||
|
- "!docs/**/*.md"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: execute-notebook-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
call-gate:
|
||||||
|
# Align with PR Test: fail fast if PR doesn't have run-ci label.
|
||||||
|
# This makes /tag-and-rerun-ci work by rerunning this failed workflow.
|
||||||
|
uses: ./.github/workflows/pr-gate.yml
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-all-notebooks:
|
||||||
|
needs: [call-gate]
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
if: github.event_name != 'pull_request' || needs.call-gate.result == 'success'
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
pip install -r docs/requirements.txt
|
||||||
|
apt-get update && apt-get install -y pandoc parallel retry
|
||||||
|
ln -sf "$(which python3)" /usr/bin/python
|
||||||
|
|
||||||
|
- name: Setup Jupyter Kernel
|
||||||
|
run: |
|
||||||
|
python -m ipykernel install --user --name python3 --display-name "Python 3"
|
||||||
|
|
||||||
|
- name: Execute notebooks
|
||||||
|
timeout-minutes: 40
|
||||||
|
run: |
|
||||||
|
cd docs
|
||||||
|
make clean
|
||||||
|
make compile
|
||||||
|
|
||||||
|
|
||||||
|
notebook-finish:
|
||||||
|
needs: [
|
||||||
|
call-gate,
|
||||||
|
run-all-notebooks
|
||||||
|
]
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
if: always() && needs.run-all-notebooks.result != 'skipped'
|
||||||
|
steps:
|
||||||
|
- name: Check all dependent job statuses
|
||||||
|
run: |
|
||||||
|
results=(${{ join(needs.*.result, ' ') }})
|
||||||
|
for result in "${results[@]}"; do
|
||||||
|
if [ "$result" = "failure" ] || [ "$result" = "cancelled" ]; then
|
||||||
|
echo "Job failed with result: $result"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
echo "All jobs completed successfully"
|
||||||
|
exit 0
|
||||||
355
third_party/sglang/.github/workflows/full-test-npu.yml
vendored
Normal file
355
third_party/sglang/.github/workflows/full-test-npu.yml
vendored
Normal file
@@ -0,0 +1,355 @@
|
|||||||
|
name: Full Test (NPU)
|
||||||
|
|
||||||
|
on:
|
||||||
|
# pull_request:
|
||||||
|
# branches:
|
||||||
|
# - main
|
||||||
|
# paths:
|
||||||
|
# - ".github/workflows/full-test-npu.yml"
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref (branch, tag, or SHA) to test. If not provided, uses the default branch.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
job_filter:
|
||||||
|
description: 'Select which job to run (leave empty or "all" to run all jobs)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'all'
|
||||||
|
image_a3:
|
||||||
|
description: 'The a3 running docker image of the test task.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11'
|
||||||
|
skip_install_flag:
|
||||||
|
description: 'Indicates whether to skip the installation of sglang, defaulting to false.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: full-test-npu-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: ${{ github.event_name != 'workflow_call' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
set-image-config:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
ref: ${{ steps.set-vars.outputs.ref }}
|
||||||
|
job_filter: ${{ steps.set-vars.outputs.job_filter }}
|
||||||
|
image_a3: ${{ steps.set-vars.outputs.image_a3 }}
|
||||||
|
skip_install_flag: ${{ steps.set-vars.outputs.skip_install_flag }}
|
||||||
|
steps:
|
||||||
|
# When triggered by PR, no inputs parameters are used. The latest community code is tested by default.
|
||||||
|
- name: Set image config
|
||||||
|
id: set-vars
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.ref }}" ]; then
|
||||||
|
echo "ref=" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "ref=${{ inputs.ref }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${{ inputs.job_filter }}" ]; then
|
||||||
|
echo "job_filter=all" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "job_filter=${{ inputs.job_filter }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${{ inputs.image_a3 }}" ]; then
|
||||||
|
echo "image_a3=swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "image_a3=${{ inputs.image_a3 }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${{ inputs.skip_install_flag }}" ]; then
|
||||||
|
echo "skip_install_flag=false" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "skip_install_flag=${{ inputs.skip_install_flag }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
nighly-test-npu:
|
||||||
|
needs: [set-image-config]
|
||||||
|
name: nightly-test-npu
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
uses: ./.github/workflows/nightly-test-npu.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref }}
|
||||||
|
job_filter: ${{ needs.set-image-config.outputs.job_filter }}
|
||||||
|
image_a3: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
skip_install_flag: ${{ needs.set-image-config.outputs.skip_install_flag }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
full-1-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-2
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.25 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite full-1-npu-a3 --nightly --continue-on-error --timeout-per-file 3600
|
||||||
|
|
||||||
|
full-2-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-2
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.25 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite full-2-npu-a3 --nightly --continue-on-error --timeout-per-file 3600
|
||||||
|
|
||||||
|
full-4-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-4
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.25 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite full-4-npu-a3 --nightly --continue-on-error --timeout-per-file 3600
|
||||||
|
|
||||||
|
full-16-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-16
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.25 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite full-16-npu-a3 --nightly --continue-on-error --timeout-per-file 3600
|
||||||
|
|
||||||
|
check-all-jobs:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && always()
|
||||||
|
needs:
|
||||||
|
- nighly-test-npu
|
||||||
|
- full-1-npu-a3
|
||||||
|
- full-2-npu-a3
|
||||||
|
- full-4-npu-a3
|
||||||
|
- full-16-npu-a3
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
container:
|
||||||
|
image: docker.m.daocloud.io/ubuntu:22.04
|
||||||
|
steps:
|
||||||
|
- name: Check if any job failed
|
||||||
|
run: |
|
||||||
|
if [[ "${{ contains(needs.*.result, 'failure') }}" == "true" ]]; then
|
||||||
|
echo "One or more nightly test jobs failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if [[ "${{ contains(needs.*.result, 'cancelled') }}" == "true" ]]; then
|
||||||
|
echo "One or more nightly test jobs were cancelled"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "All nightly test jobs passed"
|
||||||
20
third_party/sglang/.github/workflows/labeler.yml
vendored
Normal file
20
third_party/sglang/.github/workflows/labeler.yml
vendored
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
name: Auto Label PRs
|
||||||
|
|
||||||
|
on:
|
||||||
|
pull_request_target:
|
||||||
|
types: [opened, synchronize, reopened]
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
pull-requests: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
label:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Auto-label by file changes
|
||||||
|
uses: actions/labeler@v5
|
||||||
|
with:
|
||||||
|
repo-token: "${{ secrets.GITHUB_TOKEN }}"
|
||||||
|
configuration-path: .github/labeler.yml
|
||||||
|
sync-labels: false
|
||||||
39
third_party/sglang/.github/workflows/lint.yml
vendored
Normal file
39
third_party/sglang/.github/workflows/lint.yml
vendored
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
name: Lint
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
pull_request:
|
||||||
|
branches: [main]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
lint:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v4
|
||||||
|
with:
|
||||||
|
python-version: "3.12"
|
||||||
|
|
||||||
|
- name: Install pre-commit hook
|
||||||
|
run: |
|
||||||
|
python -m pip install pre-commit
|
||||||
|
pre-commit install
|
||||||
|
|
||||||
|
- name: Run pre-commit checks
|
||||||
|
run: SKIP=no-commit-to-branch pre-commit run --all-files --show-diff-on-failure
|
||||||
|
|
||||||
|
- name: Run lychee docs checks (offline references)
|
||||||
|
uses: lycheeverse/lychee-action@8646ba30535128ac92d33dfc9133794bfdd9b411 # v2
|
||||||
|
with:
|
||||||
|
args: --config .github/linters/lychee.toml README.md "docs/**/*.md" "docs/**/*.rst" "docs/**/*.ipynb"
|
||||||
|
|
||||||
|
- name: Run sgl-kernel clang-format checks
|
||||||
|
uses: DoozyX/clang-format-lint-action@v0.20
|
||||||
|
with:
|
||||||
|
source: sgl-kernel
|
||||||
|
extensions: h,c,cpp,hpp,cu,cuh,cc
|
||||||
|
clangFormatVersion: 20
|
||||||
|
style: file
|
||||||
317
third_party/sglang/.github/workflows/list-active-pr-runs.yml
vendored
Normal file
317
third_party/sglang/.github/workflows/list-active-pr-runs.yml
vendored
Normal file
@@ -0,0 +1,317 @@
|
|||||||
|
name: List Active Runs
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
workflows:
|
||||||
|
description: 'Space-separated list of workflow filenames to check'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'pr-test.yml'
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
actions: read
|
||||||
|
contents: read
|
||||||
|
pull-requests: read
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
list-active-runs:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Install GitHub CLI
|
||||||
|
run: sudo apt-get install -y gh jq
|
||||||
|
|
||||||
|
- name: List active runs grouped by PR
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
REPO: ${{ github.repository }}
|
||||||
|
WORKFLOWS: ${{ github.event.inputs.workflows || 'pr-test.yml' }}
|
||||||
|
shell: bash
|
||||||
|
run: |
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
echo "========================================="
|
||||||
|
echo "🔍 Active Workflow Runs Report"
|
||||||
|
echo "========================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Get all workflows or specific ones
|
||||||
|
read -r -a workflow_files <<< "${WORKFLOWS}"
|
||||||
|
echo "📋 Checking specified workflows: ${WORKFLOWS}"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create a temporary file to store PR data
|
||||||
|
pr_data_file=$(mktemp)
|
||||||
|
|
||||||
|
# Process each workflow
|
||||||
|
for workflow_file in ${workflow_files[@]}; do
|
||||||
|
echo "Scanning workflow: $workflow_file"
|
||||||
|
|
||||||
|
# Get all active runs (queued, waiting, in_progress)
|
||||||
|
active_runs=$(gh run list \
|
||||||
|
--repo "$REPO" \
|
||||||
|
--workflow "$workflow_file" \
|
||||||
|
--json databaseId,status,event,headBranch,createdAt,updatedAt,headSha,number,attempt \
|
||||||
|
--limit 500 \
|
||||||
|
| jq -c '.[] | select(.status=="queued" or .status=="waiting" or .status=="in_progress")')
|
||||||
|
|
||||||
|
if [ -z "$active_runs" ]; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Process each run
|
||||||
|
echo "$active_runs" | while read -r run; do
|
||||||
|
run_id=$(echo "$run" | jq -r '.databaseId')
|
||||||
|
run_status=$(echo "$run" | jq -r '.status')
|
||||||
|
run_event=$(echo "$run" | jq -r '.event')
|
||||||
|
created_at=$(echo "$run" | jq -r '.createdAt')
|
||||||
|
head_sha=$(echo "$run" | jq -r '.headSha')
|
||||||
|
run_number=$(echo "$run" | jq -r '.number')
|
||||||
|
run_attempt=$(echo "$run" | jq -r '.attempt // 1')
|
||||||
|
|
||||||
|
# Get detailed run information including jobs
|
||||||
|
run_details=$(gh api "repos/$REPO/actions/runs/$run_id" 2>/dev/null || true)
|
||||||
|
|
||||||
|
if [ -z "$run_details" ]; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
head_owner=$(echo "$run_details" | jq -r '.head_repository.owner.login // empty')
|
||||||
|
head_branch=$(echo "$run_details" | jq -r '.head_branch // empty')
|
||||||
|
|
||||||
|
if [ -z "$head_owner" ] || [ -z "$head_branch" ]; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Find PR number (may be empty for non-PR runs)
|
||||||
|
pr_number=$(gh api "repos/$REPO/pulls?state=open&head=${head_owner}:${head_branch}" \
|
||||||
|
--jq '.[0].number // empty' 2>/dev/null || true)
|
||||||
|
|
||||||
|
if [ -z "$pr_number" ]; then
|
||||||
|
pr_number="NO_PR"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get jobs for this run (with pagination to avoid missing jobs)
|
||||||
|
jobs=$(gh api "repos/$REPO/actions/runs/$run_id/jobs" --paginate --jq '.jobs[]' | jq -s '.')
|
||||||
|
|
||||||
|
running_jobs=$(echo "$jobs" | jq '[.[] | select(.status=="in_progress")] | length')
|
||||||
|
queued_jobs=$(echo "$jobs" | jq '[.[] | select(.status=="queued" or .status=="waiting")] | length')
|
||||||
|
|
||||||
|
# Get runner info for running jobs
|
||||||
|
runners=$(echo "$jobs" | jq -r '.[] | select(.status=="in_progress") | .runner_name // "N/A"' | paste -sd "," -)
|
||||||
|
|
||||||
|
# Calculate queue time
|
||||||
|
current_time=$(date -u +%s)
|
||||||
|
created_time=$(date -u -d "$created_at" +%s 2>/dev/null || echo "$current_time")
|
||||||
|
queue_time=$((current_time - created_time))
|
||||||
|
queue_minutes=$((queue_time / 60))
|
||||||
|
|
||||||
|
# Store data in temporary file (unified format with event and branch)
|
||||||
|
echo "$pr_number|$workflow_file|$run_id|$run_status|$running_jobs|$queued_jobs|$runners|$queue_minutes|$created_at|$head_sha|$run_attempt|$run_event|$head_branch" >> "$pr_data_file"
|
||||||
|
done
|
||||||
|
done
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "========================================="
|
||||||
|
echo "📊 Active Runs Summary"
|
||||||
|
echo "========================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
if [ ! -s "$pr_data_file" ]; then
|
||||||
|
echo "✅ No active runs found"
|
||||||
|
rm -f "$pr_data_file"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get unique PR numbers (exclude NO_PR entries)
|
||||||
|
pr_numbers=$(cut -d'|' -f1 < "$pr_data_file" | grep -v '^NO_PR$' | sort -u || true)
|
||||||
|
|
||||||
|
# Separate high priority and normal PRs
|
||||||
|
high_priority_prs=()
|
||||||
|
normal_prs=()
|
||||||
|
|
||||||
|
for pr_num in $pr_numbers; do
|
||||||
|
labels=$(gh pr view "$pr_num" --repo "$REPO" --json labels \
|
||||||
|
| jq -r '.labels[].name' 2>/dev/null || true)
|
||||||
|
|
||||||
|
if echo "$labels" | grep -Fxq "high priority"; then
|
||||||
|
high_priority_prs+=($pr_num)
|
||||||
|
else
|
||||||
|
normal_prs+=($pr_num)
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# Combine: high priority first, then normal
|
||||||
|
sorted_pr_numbers=("${high_priority_prs[@]}" "${normal_prs[@]}")
|
||||||
|
|
||||||
|
pr_count=0
|
||||||
|
total_running=0
|
||||||
|
total_queued=0
|
||||||
|
|
||||||
|
for pr_num in "${sorted_pr_numbers[@]}"; do
|
||||||
|
pr_count=$((pr_count + 1))
|
||||||
|
|
||||||
|
# Get PR details
|
||||||
|
pr_info=$(gh pr view "$pr_num" --repo "$REPO" --json title,author,labels,url 2>/dev/null || true)
|
||||||
|
|
||||||
|
if [ -z "$pr_info" ]; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
|
||||||
|
pr_title=$(echo "$pr_info" | jq -r '.title')
|
||||||
|
pr_author=$(echo "$pr_info" | jq -r '.author.login')
|
||||||
|
pr_url=$(echo "$pr_info" | jq -r '.url')
|
||||||
|
pr_labels=$(echo "$pr_info" | jq -r '.labels[].name' | paste -sd ", " -)
|
||||||
|
|
||||||
|
if [ -z "$pr_labels" ]; then
|
||||||
|
pr_labels="(no labels)"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Add priority indicator
|
||||||
|
priority_indicator=""
|
||||||
|
if echo "$pr_labels" | grep -q "high priority"; then
|
||||||
|
priority_indicator="🔴 [HIGH PRIORITY] "
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo "🔗 ${priority_indicator}PR #$pr_num: $pr_title"
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo "👤 Author: $pr_author"
|
||||||
|
echo "🏷️ Labels: $pr_labels"
|
||||||
|
echo "🔗 URL: $pr_url"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Get all runs for this PR
|
||||||
|
pr_runs=$(grep "^$pr_num|" "$pr_data_file")
|
||||||
|
|
||||||
|
pr_running_total=0
|
||||||
|
pr_queued_total=0
|
||||||
|
|
||||||
|
echo "$pr_runs" | while read -r line; do
|
||||||
|
workflow=$(echo "$line" | cut -d'|' -f2)
|
||||||
|
run_id=$(echo "$line" | cut -d'|' -f3)
|
||||||
|
status=$(echo "$line" | cut -d'|' -f4)
|
||||||
|
running=$(echo "$line" | cut -d'|' -f5)
|
||||||
|
queued=$(echo "$line" | cut -d'|' -f6)
|
||||||
|
runners=$(echo "$line" | cut -d'|' -f7)
|
||||||
|
queue_min=$(echo "$line" | cut -d'|' -f8)
|
||||||
|
created=$(echo "$line" | cut -d'|' -f9)
|
||||||
|
attempt=$(echo "$line" | cut -d'|' -f11)
|
||||||
|
|
||||||
|
pr_running_total=$((pr_running_total + running))
|
||||||
|
pr_queued_total=$((pr_queued_total + queued))
|
||||||
|
|
||||||
|
run_url="https://github.com/$REPO/actions/runs/$run_id"
|
||||||
|
|
||||||
|
# Calculate retry count for this specific run
|
||||||
|
retry_count=$((attempt - 1))
|
||||||
|
|
||||||
|
# Show retry indicator
|
||||||
|
retry_indicator=""
|
||||||
|
if [ "$retry_count" -gt 0 ]; then
|
||||||
|
retry_indicator=" 🔄 Retry #$retry_count"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " 📦 Workflow: $workflow (Run #$run_id)$retry_indicator"
|
||||||
|
echo " Status: $status"
|
||||||
|
echo " 🟢 Running jobs: $running"
|
||||||
|
echo " 🟡 Queued jobs: $queued"
|
||||||
|
|
||||||
|
if [ "$running" -gt 0 ] && [ "$runners" != "" ]; then
|
||||||
|
echo " 🖥️ Runners: $runners"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$queue_min" -gt 0 ]; then
|
||||||
|
echo " ⏱️ Queue time: ${queue_min} minutes"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " 🔗 Run URL: $run_url"
|
||||||
|
echo ""
|
||||||
|
done
|
||||||
|
|
||||||
|
# Summary for this PR
|
||||||
|
pr_running_total=$(grep "^$pr_num|" "$pr_data_file" | cut -d'|' -f5 | awk '{sum+=$1} END {print sum+0}')
|
||||||
|
pr_queued_total=$(grep "^$pr_num|" "$pr_data_file" | cut -d'|' -f6 | awk '{sum+=$1} END {print sum+0}')
|
||||||
|
|
||||||
|
total_running=$((total_running + pr_running_total))
|
||||||
|
total_queued=$((total_queued + pr_queued_total))
|
||||||
|
|
||||||
|
echo " 📊 PR Total: $pr_running_total running, $pr_queued_total queued"
|
||||||
|
echo ""
|
||||||
|
done
|
||||||
|
|
||||||
|
# --- Non-PR Runs Section ---
|
||||||
|
non_pr_runs=$(grep '^NO_PR|' "$pr_data_file" 2>/dev/null || true)
|
||||||
|
non_pr_running=0
|
||||||
|
non_pr_queued=0
|
||||||
|
|
||||||
|
if [ -n "$non_pr_runs" ]; then
|
||||||
|
echo "========================================="
|
||||||
|
echo "📦 Non-PR Runs (manual / scheduled / other)"
|
||||||
|
echo "========================================="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "$non_pr_runs" | while read -r line; do
|
||||||
|
workflow=$(echo "$line" | cut -d'|' -f2)
|
||||||
|
run_id=$(echo "$line" | cut -d'|' -f3)
|
||||||
|
status=$(echo "$line" | cut -d'|' -f4)
|
||||||
|
running=$(echo "$line" | cut -d'|' -f5)
|
||||||
|
queued=$(echo "$line" | cut -d'|' -f6)
|
||||||
|
runners=$(echo "$line" | cut -d'|' -f7)
|
||||||
|
queue_min=$(echo "$line" | cut -d'|' -f8)
|
||||||
|
created=$(echo "$line" | cut -d'|' -f9)
|
||||||
|
attempt=$(echo "$line" | cut -d'|' -f11)
|
||||||
|
event=$(echo "$line" | cut -d'|' -f12)
|
||||||
|
branch=$(echo "$line" | cut -d'|' -f13)
|
||||||
|
|
||||||
|
run_url="https://github.com/$REPO/actions/runs/$run_id"
|
||||||
|
|
||||||
|
retry_count=$((attempt - 1))
|
||||||
|
retry_indicator=""
|
||||||
|
if [ "$retry_count" -gt 0 ]; then
|
||||||
|
retry_indicator=" 🔄 Retry #$retry_count"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " 📦 Workflow: $workflow (Run #$run_id)$retry_indicator"
|
||||||
|
echo " Event: $event"
|
||||||
|
echo " Branch: $branch"
|
||||||
|
echo " Status: $status"
|
||||||
|
echo " 🟢 Running jobs: $running"
|
||||||
|
echo " 🟡 Queued jobs: $queued"
|
||||||
|
|
||||||
|
if [ "$running" -gt 0 ] && [ "$runners" != "" ]; then
|
||||||
|
echo " 🖥️ Runners: $runners"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$queue_min" -gt 0 ]; then
|
||||||
|
echo " ⏱️ Queue time: ${queue_min} minutes"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo " 🔗 Run URL: $run_url"
|
||||||
|
echo ""
|
||||||
|
done
|
||||||
|
|
||||||
|
non_pr_running=$(echo "$non_pr_runs" | cut -d'|' -f5 | awk '{sum+=$1} END {print sum+0}')
|
||||||
|
non_pr_queued=$(echo "$non_pr_runs" | cut -d'|' -f6 | awk '{sum+=$1} END {print sum+0}')
|
||||||
|
non_pr_count=$(echo "$non_pr_runs" | wc -l | tr -d ' ')
|
||||||
|
|
||||||
|
total_running=$((total_running + non_pr_running))
|
||||||
|
total_queued=$((total_queued + non_pr_queued))
|
||||||
|
|
||||||
|
echo " 📊 Non-PR Total: $non_pr_running running, $non_pr_queued queued"
|
||||||
|
echo ""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Overall summary
|
||||||
|
echo "========================================="
|
||||||
|
echo "📈 Overall Summary"
|
||||||
|
echo "========================================="
|
||||||
|
echo "Total PRs with active runs: $pr_count"
|
||||||
|
echo "Total non-PR active runs: ${non_pr_count:-0}"
|
||||||
|
echo "Total running jobs: $total_running"
|
||||||
|
echo "Total queued jobs: $total_queued"
|
||||||
|
echo "========================================="
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
rm -f "$pr_data_file"
|
||||||
32
third_party/sglang/.github/workflows/nightly-link-check.yml
vendored
Normal file
32
third_party/sglang/.github/workflows/nightly-link-check.yml
vendored
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
name: Nightly Link Check
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: "0 2 * * *"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: nightly-link-check-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
lychee-online:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
timeout-minutes: 20
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Run lychee online link checks
|
||||||
|
uses: lycheeverse/lychee-action@8646ba30535128ac92d33dfc9133794bfdd9b411 # v2
|
||||||
|
with:
|
||||||
|
fail: true
|
||||||
|
args: >-
|
||||||
|
--config .github/linters/lychee-ci.toml
|
||||||
|
README.md
|
||||||
|
docs/**/*.md
|
||||||
|
docs/**/*.rst
|
||||||
|
docs/**/*.ipynb
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
196
third_party/sglang/.github/workflows/nightly-release-gateway.yml
vendored
Normal file
196
third_party/sglang/.github/workflows/nightly-release-gateway.yml
vendored
Normal file
@@ -0,0 +1,196 @@
|
|||||||
|
# Nightly release workflow for SGLang Model Gateway
|
||||||
|
|
||||||
|
name: Nightly Release SGLang Model Gateway to PyPI
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
# Run at 2 AM UTC every day
|
||||||
|
- cron: '0 2 * * *'
|
||||||
|
workflow_dispatch: # Allow manual trigger
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
name: build on ${{ matrix.platform || matrix.os }} (${{ matrix.target }} - ${{ matrix.manylinux || 'auto' }})
|
||||||
|
runs-on: ${{ matrix.os }}-latest
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
os: [ubuntu, macos, windows]
|
||||||
|
target: [x86_64, aarch64]
|
||||||
|
manylinux: [auto]
|
||||||
|
include:
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
- os: windows
|
||||||
|
ls: dir
|
||||||
|
target: x86_64
|
||||||
|
python-architecture: x64
|
||||||
|
interpreter: 3.9 3.10 3.11 3.12 3.13
|
||||||
|
- os: macos
|
||||||
|
target: aarch64
|
||||||
|
interpreter: 3.9 3.10 3.11 3.12 3.13
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
target: aarch64
|
||||||
|
# musllinux
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
target: x86_64
|
||||||
|
manylinux: musllinux_1_1
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
target: aarch64
|
||||||
|
manylinux: musllinux_1_1
|
||||||
|
exclude:
|
||||||
|
- os: windows
|
||||||
|
target: aarch64
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
path: sglang-repo
|
||||||
|
|
||||||
|
- name: Move sgl-model-gateway folder to root and delete sglang-repo
|
||||||
|
run: |
|
||||||
|
mv sglang-repo/sgl-model-gateway/* .
|
||||||
|
rm -rf sglang-repo
|
||||||
|
ls -alt
|
||||||
|
shell: bash
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.13"
|
||||||
|
architecture: ${{ matrix.python-architecture || 'x64' }}
|
||||||
|
|
||||||
|
- name: Modify version for nightly release
|
||||||
|
run: |
|
||||||
|
# Get current version from pyproject.toml
|
||||||
|
CURRENT_VERSION=$(python -c "import tomllib; print(tomllib.load(open('bindings/python/pyproject.toml', 'rb'))['project']['version'])" 2>/dev/null || python -c "import tomli; print(tomli.load(open('bindings/python/pyproject.toml', 'rb'))['project']['version'])")
|
||||||
|
# Create nightly version with date: e.g., 0.2.1.dev20250128
|
||||||
|
NIGHTLY_VERSION="${CURRENT_VERSION}.dev$(date +%Y%m%d)"
|
||||||
|
echo "Nightly version: $NIGHTLY_VERSION"
|
||||||
|
|
||||||
|
# Update pyproject.toml with nightly version (temporary, not committed)
|
||||||
|
sed -i.bak "s/version = \"${CURRENT_VERSION}\"/version = \"${NIGHTLY_VERSION}\"/" bindings/python/pyproject.toml
|
||||||
|
|
||||||
|
# Verify the change
|
||||||
|
cat bindings/python/pyproject.toml | grep "^version"
|
||||||
|
shell: bash
|
||||||
|
|
||||||
|
- name: Install twine and tomli
|
||||||
|
run: pip install -U twine tomli
|
||||||
|
|
||||||
|
- name: Install protoc (macOS)
|
||||||
|
if: matrix.os == 'macos'
|
||||||
|
run: brew install protobuf
|
||||||
|
|
||||||
|
- name: Install protoc (Windows)
|
||||||
|
if: matrix.os == 'windows'
|
||||||
|
run: choco install protoc -y
|
||||||
|
|
||||||
|
- name: Build wheels
|
||||||
|
uses: PyO3/maturin-action@v1
|
||||||
|
with:
|
||||||
|
working-directory: bindings/python
|
||||||
|
target: ${{ matrix.target }}
|
||||||
|
manylinux: ${{ matrix.manylinux || 'auto' }}
|
||||||
|
args: --release --out dist --features vendored-openssl --interpreter ${{ matrix.interpreter || '3.9 3.10 3.11 3.12 3.13 3.14' }}
|
||||||
|
rust-toolchain: stable
|
||||||
|
docker-options: -e CI -e CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc -e CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++
|
||||||
|
before-script-linux: |
|
||||||
|
# Install build dependencies (perl/make for vendored OpenSSL, protoc for gRPC)
|
||||||
|
if command -v yum &> /dev/null; then
|
||||||
|
yum update -y && yum install -y wget unzip gcc gcc-c++ perl-core make
|
||||||
|
# Install cross-compilation toolchain for aarch64 if needed
|
||||||
|
if [ "${{ matrix.target }}" = "aarch64" ]; then
|
||||||
|
yum install -y gcc-aarch64-linux-gnu gcc-c++-aarch64-linux-gnu || true
|
||||||
|
fi
|
||||||
|
elif command -v apt-get &> /dev/null; then
|
||||||
|
apt-get update && apt-get install -y wget unzip gcc g++ perl make
|
||||||
|
# Install cross-compilation toolchain for aarch64 if needed
|
||||||
|
if [ "${{ matrix.target }}" = "aarch64" ]; then
|
||||||
|
apt-get install -y gcc-aarch64-linux-gnu g++-aarch64-linux-gnu || true
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
(cd /tmp && \
|
||||||
|
wget https://github.com/protocolbuffers/protobuf/releases/download/v32.0/protoc-32.0-linux-x86_64.zip && \
|
||||||
|
unzip protoc-32.0-linux-x86_64.zip -d /usr/local && \
|
||||||
|
rm protoc-32.0-linux-x86_64.zip)
|
||||||
|
protoc --version
|
||||||
|
|
||||||
|
- name: List built packages
|
||||||
|
run: ${{ matrix.ls || 'ls -lh' }} bindings/python/dist/
|
||||||
|
|
||||||
|
- name: Check packages
|
||||||
|
run: twine check --strict bindings/python/dist/*
|
||||||
|
|
||||||
|
- uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: packages-${{ matrix.os }}-${{ matrix.target }}-${{ matrix.manylinux || 'auto' }}
|
||||||
|
path: bindings/python/dist/
|
||||||
|
|
||||||
|
build-sdist:
|
||||||
|
name: Build SDist
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
path: sglang-repo
|
||||||
|
|
||||||
|
- name: Move sgl-model-gateway folder to root and delete sglang-repo
|
||||||
|
run: |
|
||||||
|
mv sglang-repo/sgl-model-gateway/* .
|
||||||
|
rm -rf sglang-repo
|
||||||
|
ls -alt
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.13"
|
||||||
|
|
||||||
|
- name: Modify version for nightly release
|
||||||
|
run: |
|
||||||
|
# Get current version from pyproject.toml
|
||||||
|
CURRENT_VERSION=$(python -c "import tomllib; print(tomllib.load(open('bindings/python/pyproject.toml', 'rb'))['project']['version'])" 2>/dev/null || python -c "import tomli; print(tomli.load(open('bindings/python/pyproject.toml', 'rb'))['project']['version'])")
|
||||||
|
# Create nightly version with date: e.g., 0.2.1.dev20250128
|
||||||
|
NIGHTLY_VERSION="${CURRENT_VERSION}.dev$(date +%Y%m%d)"
|
||||||
|
echo "Nightly version: $NIGHTLY_VERSION"
|
||||||
|
|
||||||
|
# Update pyproject.toml with nightly version (temporary, not committed)
|
||||||
|
sed -i "s/version = \"${CURRENT_VERSION}\"/version = \"${NIGHTLY_VERSION}\"/" bindings/python/pyproject.toml
|
||||||
|
|
||||||
|
# Verify the change
|
||||||
|
cat bindings/python/pyproject.toml | grep "^version"
|
||||||
|
|
||||||
|
- name: Build SDist
|
||||||
|
uses: PyO3/maturin-action@v1
|
||||||
|
with:
|
||||||
|
working-directory: bindings/python
|
||||||
|
command: sdist
|
||||||
|
args: --out dist
|
||||||
|
rust-toolchain: stable
|
||||||
|
|
||||||
|
- uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: sdist
|
||||||
|
path: bindings/python/dist/*.tar.gz
|
||||||
|
|
||||||
|
upload:
|
||||||
|
name: Upload to TestPyPI
|
||||||
|
if: github.repository == 'sgl-project/sglang' # Ensure this job only runs for the sgl-project/sglang repository
|
||||||
|
needs: [build, build-sdist]
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: dist
|
||||||
|
merge-multiple: true
|
||||||
|
|
||||||
|
- name: Upload to TestPyPI
|
||||||
|
env:
|
||||||
|
TWINE_USERNAME: __token__
|
||||||
|
TWINE_PASSWORD: ${{ secrets.TEST_PYPI_TOKEN_ROUTER }}
|
||||||
|
run: |
|
||||||
|
pip install twine
|
||||||
|
twine upload --repository testpypi dist/* --verbose
|
||||||
1457
third_party/sglang/.github/workflows/nightly-test-amd-rocm720.yml
vendored
Normal file
1457
third_party/sglang/.github/workflows/nightly-test-amd-rocm720.yml
vendored
Normal file
File diff suppressed because it is too large
Load Diff
1429
third_party/sglang/.github/workflows/nightly-test-amd.yml
vendored
Normal file
1429
third_party/sglang/.github/workflows/nightly-test-amd.yml
vendored
Normal file
File diff suppressed because it is too large
Load Diff
33
third_party/sglang/.github/workflows/nightly-test-intel.yml
vendored
Normal file
33
third_party/sglang/.github/workflows/nightly-test-intel.yml
vendored
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
name: Nightly Test (Intel)
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 0 * * *'
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- "python/sglang/version.py"
|
||||||
|
workflow_dispatch:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: "Branch, tag or SHA to checkout"
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ""
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: nightly-test-intel-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: ${{ github.event_name != 'workflow_call' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
# Placeholder for Intel GPU tests
|
||||||
|
# Add Intel-specific nightly test workflows here when available
|
||||||
|
|
||||||
|
placeholder:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Placeholder
|
||||||
|
run: echo "Intel nightly tests will be added here"
|
||||||
428
third_party/sglang/.github/workflows/nightly-test-npu.yml
vendored
Normal file
428
third_party/sglang/.github/workflows/nightly-test-npu.yml
vendored
Normal file
@@ -0,0 +1,428 @@
|
|||||||
|
name: Nightly Test (NPU)
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 18 * * *' # Execute at 2:00 a.m. Beijing Time every day
|
||||||
|
pull_request:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- ".github/workflows/nightly-test-npu.yml"
|
||||||
|
workflow_dispatch:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref (branch, tag, or SHA) to test. If not provided, uses the default branch.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
job_filter:
|
||||||
|
description: 'Select which job to run (leave empty or "all" to run all jobs)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'all'
|
||||||
|
image_a3:
|
||||||
|
description: 'The a3 running docker image of the test task.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11'
|
||||||
|
skip_install_flag:
|
||||||
|
description: 'Indicates whether to skip the installation of sglang, defaulting to false.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: nightly-test-npu-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: ${{ github.event_name != 'workflow_call' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
set-image-config:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
ref: ${{ steps.set-vars.outputs.ref }}
|
||||||
|
job_filter: ${{ steps.set-vars.outputs.job_filter }}
|
||||||
|
image_a3: ${{ steps.set-vars.outputs.image_a3 }}
|
||||||
|
skip_install_flag: ${{ steps.set-vars.outputs.skip_install_flag }}
|
||||||
|
steps:
|
||||||
|
# When triggered by PR, no inputs parameters are used. The latest community code is tested by default.
|
||||||
|
- name: Set image config
|
||||||
|
id: set-vars
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.ref }}" ]; then
|
||||||
|
echo "ref=" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "ref=${{ inputs.ref }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${{ inputs.job_filter }}" ]; then
|
||||||
|
echo "job_filter=all" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "job_filter=${{ inputs.job_filter }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${{ inputs.image_a3 }}" ]; then
|
||||||
|
echo "image_a3=swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "image_a3=${{ inputs.image_a3 }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${{ inputs.skip_install_flag }}" ]; then
|
||||||
|
echo "skip_install_flag=false" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "skip_install_flag=${{ inputs.skip_install_flag }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
nightly-1-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-2
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.32 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite nightly-1-npu-a3 --nightly --continue-on-error --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2
|
||||||
|
|
||||||
|
nightly-2-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-2
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0]
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.32 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite nightly-2-npu-a3 --nightly --continue-on-error --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 1
|
||||||
|
|
||||||
|
nightly-4-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-4
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0]
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref|| github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.32 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite nightly-4-npu-a3 --nightly --continue-on-error --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 1
|
||||||
|
|
||||||
|
nightly-8-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-8
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0]
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.32 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite nightly-8-npu-a3 --nightly --continue-on-error --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 1
|
||||||
|
|
||||||
|
nightly-16-npu-a3:
|
||||||
|
needs: [set-image-config]
|
||||||
|
if: ${{ (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request') }}
|
||||||
|
runs-on: linux-aarch64-a3-16
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
container:
|
||||||
|
image: ${{ needs.set-image-config.outputs.image_a3 }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.set-image-config.outputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
if [ ${{ needs.set-image-config.outputs.skip_install_flag }} != "true" ];then
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
fi
|
||||||
|
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Print Log Information
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/npu/npu_log_print.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
pip install sglang_router
|
||||||
|
hf download lmms-lab/MMMU --repo-type dataset
|
||||||
|
pip install sentence_transformers torchaudio==2.8.0
|
||||||
|
pip install protobuf==6.31.1 zss pre-commit wandb>=0.16.0 tenacity==8.3.0 loguru openpyxl latex2sympy2 zstandard transformers-stream-generator tqdm-multiprocess pycocoevalcap
|
||||||
|
pip install yt-dlp sentencepiece==0.1.99 nltk av ftfy sqlitedict==2.1.0 sacrebleu>=1.5.0 pytablewriter black==24.1.0 isort==5.13.2 peft>=0.2.0 accelerate>=0.29.1
|
||||||
|
pip install jsonlines httpx==0.25.0 evaluate>=0.4.0 datasets==2.16.1 numexpr xgrammar==0.1.32 numpy==1.26.4 dotenv
|
||||||
|
git clone --branch v0.3.3 --depth 1 https://github.com/EvolvingLMMs-Lab/lmms-eval.git
|
||||||
|
cd ./lmms-eval
|
||||||
|
nohup pip install . > lmmslog.txt 2>&1 &
|
||||||
|
sleep 120
|
||||||
|
export PYTHONPATH=$PYTHONPATH:$(pwd)
|
||||||
|
cd ../
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite nightly-16-npu-a3 --nightly --continue-on-error --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2
|
||||||
|
|
||||||
|
check-all-jobs:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && always()
|
||||||
|
needs:
|
||||||
|
- nightly-1-npu-a3
|
||||||
|
- nightly-2-npu-a3
|
||||||
|
- nightly-4-npu-a3
|
||||||
|
- nightly-8-npu-a3
|
||||||
|
- nightly-16-npu-a3
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
container:
|
||||||
|
image: docker.m.daocloud.io/ubuntu:22.04
|
||||||
|
steps:
|
||||||
|
- name: Check if any job failed
|
||||||
|
run: |
|
||||||
|
if [[ "${{ contains(needs.*.result, 'failure') }}" == "true" ]]; then
|
||||||
|
echo "One or more nightly test jobs failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if [[ "${{ contains(needs.*.result, 'cancelled') }}" == "true" ]]; then
|
||||||
|
echo "One or more nightly test jobs were cancelled"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "All nightly test jobs passed"
|
||||||
796
third_party/sglang/.github/workflows/nightly-test-nvidia.yml
vendored
Normal file
796
third_party/sglang/.github/workflows/nightly-test-nvidia.yml
vendored
Normal file
@@ -0,0 +1,796 @@
|
|||||||
|
name: Nightly Test (Nvidia)
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 0 * * *'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
job_filter:
|
||||||
|
description: 'Select which job to run (leave empty or "all" to run all jobs)'
|
||||||
|
required: false
|
||||||
|
type: choice
|
||||||
|
default: 'all'
|
||||||
|
options:
|
||||||
|
- 'all'
|
||||||
|
- 'nightly-test-general-1-gpu-h100'
|
||||||
|
- 'nightly-test-general-4-gpu-h100'
|
||||||
|
- 'nightly-test-general-8-gpu-h200'
|
||||||
|
- 'nightly-test-general-8-gpu-h20'
|
||||||
|
- 'nightly-test-general-8-gpu-b200'
|
||||||
|
- 'nightly-test-text-accuracy-2-gpu-h100'
|
||||||
|
- 'nightly-test-text-perf-2-gpu-h100'
|
||||||
|
- 'nightly-test-vlm-accuracy-2-gpu-h100'
|
||||||
|
- 'nightly-test-vlm-perf-2-gpu-h100'
|
||||||
|
- 'nightly-test-multimodal-server-1-gpu'
|
||||||
|
- 'nightly-test-multimodal-server-2-gpu'
|
||||||
|
- 'nightly-test-perf-4-gpu-b200'
|
||||||
|
- 'nightly-test-perf-8-gpu-b200'
|
||||||
|
- 'nightly-test-specialized-8-gpu-b200'
|
||||||
|
- 'nightly-test-kernel-1-gpu-h100'
|
||||||
|
- 'nightly-test-diffusion-comparison'
|
||||||
|
- 'nightly-test-kernel-8-gpu-h200'
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref (branch, tag, or SHA) to test. If not provided, uses the default branch.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
job_filter:
|
||||||
|
description: 'Select which job to run (leave empty or "all" to run all jobs)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'all'
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: nightly-test-nvidia-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: ${{ github.event_name != 'workflow_call' }}
|
||||||
|
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
SGLANG_CUDA_COREDUMP: "1"
|
||||||
|
HF_HUB_DOWNLOAD_TIMEOUT: 300
|
||||||
|
HF_HUB_ETAG_TIMEOUT: 300
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
# General tests - 1 GPU
|
||||||
|
nightly-test-general-1-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-general-1-gpu-h100')
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-1-gpu --nightly --continue-on-error
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# JIT kernel full unit tests (expanded parameter ranges via SGLANG_JIT_KERNEL_RUN_FULL_TESTS)
|
||||||
|
nightly-test-kernel-1-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-kernel-1-gpu-h100')
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
# Full jit_kernel test grids (see sglang.jit_kernel.utils.should_run_full_tests)
|
||||||
|
SGLANG_JIT_KERNEL_RUN_FULL_TESTS: "1"
|
||||||
|
# Match pr-test-jit-kernel workflow for consistent JIT warmup behavior
|
||||||
|
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: true
|
||||||
|
# Allow maintenance bypass on default branch (same semantics as PR JIT workflow)
|
||||||
|
SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN: ${{ github.ref == 'refs/heads/main' && 'true' || 'false' }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run jit kernel nightly suite
|
||||||
|
timeout-minutes: 60
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-kernel-1-gpu --nightly --continue-on-error
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
nightly-test-kernel-8-gpu-h200:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-kernel-8-gpu-h200')
|
||||||
|
runs-on: 8-gpu-h200
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
SGLANG_JIT_KERNEL_RUN_FULL_TESTS: "1"
|
||||||
|
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: true
|
||||||
|
SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN: ${{ github.ref == 'refs/heads/main' && 'true' || 'false' }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run multi-GPU jit kernel nightly suite
|
||||||
|
timeout-minutes: 90
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-kernel-8-gpu-h200 --nightly --continue-on-error
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# General tests - 4 GPU H100
|
||||||
|
nightly-test-general-4-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-general-4-gpu-h100')
|
||||||
|
runs-on: 4-gpu-h100
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-4-gpu --nightly --continue-on-error
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# General tests - 8 GPU H200
|
||||||
|
nightly-test-general-8-gpu-h200:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-general-8-gpu-h200')
|
||||||
|
runs-on: 8-gpu-h200
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
partition: [0, 1, 2, 3]
|
||||||
|
env:
|
||||||
|
RUNNER_LABELS: 8-gpu-h200
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run common 8-GPU model tests
|
||||||
|
if: always()
|
||||||
|
timeout-minutes: 300
|
||||||
|
env:
|
||||||
|
TRACE_BASE_URL: https://raw.githubusercontent.com/sglang-bot/sglang-ci-data/main/traces/${{ github.run_id }}
|
||||||
|
PERFETTO_RELAY_URL: ${{ vars.PERFETTO_RELAY_URL }}
|
||||||
|
GPU_CONFIG: "8-gpu-h200"
|
||||||
|
IS_H200: "1"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-8-gpu-common --nightly --timeout-per-file=18000 --continue-on-error --auto-partition-id=${{ matrix.partition }} --auto-partition-size=4
|
||||||
|
|
||||||
|
- name: Publish traces to storage repo
|
||||||
|
if: always()
|
||||||
|
continue-on-error: true
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
GITHUB_RUN_NUMBER: ${{ github.run_number }}
|
||||||
|
run: |
|
||||||
|
TRACE_ARGS=""
|
||||||
|
for dir in test/performance_profiles_*/; do
|
||||||
|
[ -d "$dir" ] && TRACE_ARGS="$TRACE_ARGS --traces-dir $dir"
|
||||||
|
done
|
||||||
|
if [ -n "$TRACE_ARGS" ]; then
|
||||||
|
python3 scripts/ci/utils/publish_traces.py $TRACE_ARGS
|
||||||
|
find test/performance_profiles_*/ -name '*.json.gz' -delete
|
||||||
|
else
|
||||||
|
echo "No trace directories found, skipping publish"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 30
|
||||||
|
env:
|
||||||
|
GPU_CONFIG: "8-gpu-h200"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-8-gpu-h200 --nightly --continue-on-error
|
||||||
|
|
||||||
|
- name: Collect performance metrics
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/save_metrics.py \
|
||||||
|
--gpu-config 8-gpu-h200 \
|
||||||
|
--partition ${{ matrix.partition }} \
|
||||||
|
--run-id ${{ github.run_id }} \
|
||||||
|
--output test/metrics-8gpu-h200-partition-${{ matrix.partition }}.json \
|
||||||
|
--search-dir test/performance_profiles_8_gpu \
|
||||||
|
--search-dir test
|
||||||
|
|
||||||
|
- name: Upload partition metrics
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: metrics-8gpu-h200-partition-${{ matrix.partition }}
|
||||||
|
path: test/metrics-8gpu-h200-partition-${{ matrix.partition }}.json
|
||||||
|
retention-days: 5
|
||||||
|
if-no-files-found: ignore
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
with:
|
||||||
|
artifact-suffix: ${{ matrix.partition }}
|
||||||
|
|
||||||
|
# General tests - 8 GPU H20
|
||||||
|
nightly-test-general-8-gpu-h20:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-general-8-gpu-h20')
|
||||||
|
runs-on: 8-gpu-h20
|
||||||
|
env:
|
||||||
|
SGLANG_CI_RDMA_ALL_DEVICES: "mlx5_1,mlx5_2,mlx5_3,mlx5_4"
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 30
|
||||||
|
env:
|
||||||
|
GPU_CONFIG: "8-gpu-h20"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-8-gpu-h20 --nightly --continue-on-error
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# General tests - 8 GPU B200
|
||||||
|
nightly-test-general-8-gpu-b200:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-general-8-gpu-b200')
|
||||||
|
runs-on: 8-gpu-b200
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
partition: [0, 1, 2, 3]
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run common 8-GPU model tests
|
||||||
|
if: always()
|
||||||
|
timeout-minutes: 300
|
||||||
|
env:
|
||||||
|
TRACE_BASE_URL: https://raw.githubusercontent.com/sglang-bot/sglang-ci-data/main/traces/${{ github.run_id }}
|
||||||
|
PERFETTO_RELAY_URL: ${{ vars.PERFETTO_RELAY_URL }}
|
||||||
|
GPU_CONFIG: "8-gpu-b200"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-8-gpu-common --nightly --timeout-per-file=12000 --continue-on-error --auto-partition-id=${{ matrix.partition }} --auto-partition-size=4
|
||||||
|
|
||||||
|
- name: Publish traces to storage repo
|
||||||
|
if: always()
|
||||||
|
continue-on-error: true
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
GITHUB_RUN_NUMBER: ${{ github.run_number }}
|
||||||
|
run: |
|
||||||
|
TRACE_ARGS=""
|
||||||
|
for dir in test/performance_profiles_*/; do
|
||||||
|
[ -d "$dir" ] && TRACE_ARGS="$TRACE_ARGS --traces-dir $dir"
|
||||||
|
done
|
||||||
|
if [ -n "$TRACE_ARGS" ]; then
|
||||||
|
python3 scripts/ci/utils/publish_traces.py $TRACE_ARGS
|
||||||
|
find test/performance_profiles_*/ -name '*.json.gz' -delete
|
||||||
|
else
|
||||||
|
echo "No trace directories found, skipping publish"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Collect performance metrics
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/save_metrics.py \
|
||||||
|
--gpu-config 8-gpu-b200 \
|
||||||
|
--partition ${{ matrix.partition }} \
|
||||||
|
--run-id ${{ github.run_id }} \
|
||||||
|
--output test/metrics-8gpu-b200-partition-${{ matrix.partition }}.json \
|
||||||
|
--search-dir test/performance_profiles_8_gpu \
|
||||||
|
--search-dir test
|
||||||
|
|
||||||
|
- name: Upload partition metrics
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: metrics-8gpu-b200-partition-${{ matrix.partition }}
|
||||||
|
path: test/metrics-8gpu-b200-partition-${{ matrix.partition }}.json
|
||||||
|
retention-days: 5
|
||||||
|
if-no-files-found: ignore
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
with:
|
||||||
|
artifact-suffix: ${{ matrix.partition }}
|
||||||
|
|
||||||
|
# Text model accuracy tests
|
||||||
|
nightly-test-text-accuracy-2-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-text-accuracy-2-gpu-h100')
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run eval test for text models
|
||||||
|
timeout-minutes: 120
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-eval-text-2-gpu --nightly --continue-on-error --timeout-per-file 4500
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# Text model performance tests
|
||||||
|
nightly-test-text-perf-2-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-text-perf-2-gpu-h100')
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run performance test for text models
|
||||||
|
timeout-minutes: 180
|
||||||
|
env:
|
||||||
|
TRACE_BASE_URL: https://raw.githubusercontent.com/sglang-bot/sglang-ci-data/main/traces/${{ github.run_id }}
|
||||||
|
PERFETTO_RELAY_URL: ${{ vars.PERFETTO_RELAY_URL }}
|
||||||
|
GPU_CONFIG: "2-gpu-h100"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
rm -rf performance_profiles_text_models/
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-perf-text-2-gpu --nightly --continue-on-error --timeout-per-file 3600
|
||||||
|
|
||||||
|
- name: Publish traces to storage repo
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
GITHUB_RUN_NUMBER: ${{ github.run_number }}
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/publish_traces.py --traces-dir test/performance_profiles_text_models
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# VLM accuracy tests
|
||||||
|
nightly-test-vlm-accuracy-2-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-vlm-accuracy-2-gpu-h100')
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run eval test for VLM models (fixed MMMU-100)
|
||||||
|
timeout-minutes: 240
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-eval-vlm-2-gpu --nightly --continue-on-error --timeout-per-file 9000
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# VLM performance tests
|
||||||
|
nightly-test-vlm-perf-2-gpu-h100:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-vlm-perf-2-gpu-h100')
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run perf test for VLM models (MMMU)
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
TRACE_BASE_URL: https://raw.githubusercontent.com/sglang-bot/sglang-ci-data/main/traces/${{ github.run_id }}
|
||||||
|
PERFETTO_RELAY_URL: ${{ vars.PERFETTO_RELAY_URL }}
|
||||||
|
GPU_CONFIG: "2-gpu-h100"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
rm -rf performance_profiles_vlms/
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-perf-vlm-2-gpu --nightly --continue-on-error --timeout-per-file 3600
|
||||||
|
|
||||||
|
- name: Publish traces to storage repo
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
GITHUB_RUN_NUMBER: ${{ github.run_number }}
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/publish_traces.py --traces-dir test/performance_profiles_vlms
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# diffusion performance tests
|
||||||
|
nightly-test-multimodal-server-1-gpu:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-multimodal-server-1-gpu')
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 5
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
pip install slack_sdk
|
||||||
|
|
||||||
|
- name: Run diffusion server tests
|
||||||
|
env:
|
||||||
|
SGLANG_DIFFUSION_SLACK_TOKEN: ${{ secrets.SGLANG_DIFFUSION_SLACK_TOKEN }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
GPU_CONFIG: "1-gpu-h100"
|
||||||
|
|
||||||
|
timeout-minutes: 90
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py \
|
||||||
|
--suite 1-gpu \
|
||||||
|
--partition-id ${{ matrix.part }} \
|
||||||
|
--total-partitions 2
|
||||||
|
|
||||||
|
- name: Collect diffusion performance metrics
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/diffusion/save_diffusion_metrics.py \
|
||||||
|
--gpu-config 1-gpu-h100 \
|
||||||
|
--run-id ${{ github.run_id }} \
|
||||||
|
--output python/diffusion-metrics-1gpu-partition-${{ matrix.part }}.json \
|
||||||
|
--results-json python/diffusion-results.json
|
||||||
|
|
||||||
|
- name: Upload diffusion metrics
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: diffusion-metrics-1gpu-partition-${{ matrix.part }}
|
||||||
|
path: python/diffusion-metrics-1gpu-partition-${{ matrix.part }}.json
|
||||||
|
retention-days: 90
|
||||||
|
if-no-files-found: ignore
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
with:
|
||||||
|
artifact-suffix: ${{ matrix.part }}
|
||||||
|
|
||||||
|
nightly-test-multimodal-server-2-gpu:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-multimodal-server-2-gpu')
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 5
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
pip install slack_sdk
|
||||||
|
|
||||||
|
- name: Run diffusion server tests
|
||||||
|
env:
|
||||||
|
SGLANG_DIFFUSION_SLACK_TOKEN: ${{ secrets.SGLANG_DIFFUSION_SLACK_TOKEN }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
GPU_CONFIG: "2-gpu-h100"
|
||||||
|
|
||||||
|
timeout-minutes: 90
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py \
|
||||||
|
--suite 2-gpu \
|
||||||
|
--partition-id ${{ matrix.part }} \
|
||||||
|
--total-partitions 2
|
||||||
|
|
||||||
|
- name: Collect diffusion performance metrics
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/diffusion/save_diffusion_metrics.py \
|
||||||
|
--gpu-config 2-gpu-h100 \
|
||||||
|
--run-id ${{ github.run_id }} \
|
||||||
|
--output python/diffusion-metrics-2gpu-partition-${{ matrix.part }}.json \
|
||||||
|
--results-json python/diffusion-results.json
|
||||||
|
|
||||||
|
- name: Upload diffusion metrics
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: diffusion-metrics-2gpu-partition-${{ matrix.part }}
|
||||||
|
path: python/diffusion-metrics-2gpu-partition-${{ matrix.part }}.json
|
||||||
|
retention-days: 90
|
||||||
|
if-no-files-found: ignore
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
with:
|
||||||
|
artifact-suffix: ${{ matrix.part }}
|
||||||
|
|
||||||
|
# B200 Performance tests - 4 GPU
|
||||||
|
nightly-test-perf-4-gpu-b200:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-perf-4-gpu-b200')
|
||||||
|
runs-on: 4-gpu-b200
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 300
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-4-gpu-b200 --nightly --continue-on-error --timeout-per-file 12000
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# Specialized B200 tests - 8 GPU, for specific backends and configs
|
||||||
|
nightly-test-specialized-8-gpu-b200:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-perf-8-gpu-b200' || inputs.job_filter == 'nightly-test-specialized-8-gpu-b200')
|
||||||
|
runs-on: 8-gpu-b200
|
||||||
|
env:
|
||||||
|
RUNNER_LABELS: 8-gpu-b200
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 120
|
||||||
|
env:
|
||||||
|
GPU_CONFIG: "8-gpu-b200"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite nightly-8-gpu-b200 --nightly --continue-on-error --timeout-per-file 2400
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# Diffusion cross-framework comparison
|
||||||
|
nightly-test-diffusion-comparison:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'nightly-test-diffusion-comparison')
|
||||||
|
runs-on: 4-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run cross-framework comparison
|
||||||
|
env:
|
||||||
|
GITHUB_SHA: ${{ github.sha }}
|
||||||
|
GITHUB_RUN_ID: ${{ github.run_id }}
|
||||||
|
PYTHONUNBUFFERED: "1"
|
||||||
|
timeout-minutes: 210
|
||||||
|
run: |
|
||||||
|
python3 -u scripts/ci/utils/diffusion/run_comparison.py \
|
||||||
|
--output comparison-results.json
|
||||||
|
|
||||||
|
- name: Generate dashboard
|
||||||
|
if: always()
|
||||||
|
env:
|
||||||
|
GH_PAT_FOR_NIGHTLY_CI_DATA: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
GH_TOKEN: ${{ github.token }}
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/diffusion/generate_diffusion_dashboard.py \
|
||||||
|
--results comparison-results.json \
|
||||||
|
--output dashboard.md \
|
||||||
|
--charts-dir comparison-charts \
|
||||||
|
--fetch-history \
|
||||||
|
--step-summary
|
||||||
|
|
||||||
|
- name: Publish to sglang-ci-data
|
||||||
|
if: always()
|
||||||
|
env:
|
||||||
|
GH_PAT_FOR_NIGHTLY_CI_DATA: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/diffusion/publish_comparison_results.py \
|
||||||
|
--results comparison-results.json \
|
||||||
|
--dashboard dashboard.md \
|
||||||
|
--charts-dir comparison-charts
|
||||||
|
|
||||||
|
- name: Upload comparison artifacts
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: diffusion-comparison-${{ github.run_id }}
|
||||||
|
path: |
|
||||||
|
comparison-results.json
|
||||||
|
dashboard.md
|
||||||
|
comparison-charts/
|
||||||
|
comparison-logs/
|
||||||
|
retention-days: 90
|
||||||
|
if-no-files-found: ignore
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
# Consolidate performance metrics from all jobs
|
||||||
|
consolidate-metrics:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && always()
|
||||||
|
needs:
|
||||||
|
- nightly-test-general-8-gpu-h200
|
||||||
|
- nightly-test-general-8-gpu-b200
|
||||||
|
- nightly-test-multimodal-server-1-gpu
|
||||||
|
- nightly-test-multimodal-server-2-gpu
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Download all partition metrics
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
pattern: "*metrics-*"
|
||||||
|
path: metrics/
|
||||||
|
merge-multiple: true
|
||||||
|
|
||||||
|
- name: List downloaded metrics
|
||||||
|
run: |
|
||||||
|
echo "Downloaded metrics files:"
|
||||||
|
find metrics/ -name "*.json" -type f 2>/dev/null || echo "No metrics files found"
|
||||||
|
|
||||||
|
- name: Merge metrics
|
||||||
|
run: |
|
||||||
|
python3 scripts/ci/utils/merge_metrics.py \
|
||||||
|
--input-dir metrics/ \
|
||||||
|
--output consolidated-metrics-${{ github.run_id }}.json \
|
||||||
|
--run-id ${{ github.run_id }} \
|
||||||
|
--commit-sha ${{ github.sha }} \
|
||||||
|
--branch ${{ github.ref_name }}
|
||||||
|
|
||||||
|
- name: Upload consolidated metrics
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: consolidated-metrics-${{ github.run_id }}
|
||||||
|
path: consolidated-metrics-${{ github.run_id }}.json
|
||||||
|
retention-days: 90
|
||||||
|
if-no-files-found: warn
|
||||||
|
|
||||||
|
# Final check job
|
||||||
|
check-all-jobs:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && always()
|
||||||
|
needs:
|
||||||
|
- nightly-test-general-1-gpu-h100
|
||||||
|
- nightly-test-general-4-gpu-h100
|
||||||
|
- nightly-test-general-8-gpu-h200
|
||||||
|
- nightly-test-general-8-gpu-h20
|
||||||
|
- nightly-test-general-8-gpu-b200
|
||||||
|
- nightly-test-text-accuracy-2-gpu-h100
|
||||||
|
- nightly-test-text-perf-2-gpu-h100
|
||||||
|
- nightly-test-vlm-accuracy-2-gpu-h100
|
||||||
|
- nightly-test-vlm-perf-2-gpu-h100
|
||||||
|
- nightly-test-multimodal-server-1-gpu
|
||||||
|
- nightly-test-multimodal-server-2-gpu
|
||||||
|
- nightly-test-perf-4-gpu-b200
|
||||||
|
- nightly-test-specialized-8-gpu-b200
|
||||||
|
- nightly-test-diffusion-comparison
|
||||||
|
- consolidate-metrics
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Check if any job failed
|
||||||
|
run: |
|
||||||
|
if [[ "${{ contains(needs.*.result, 'failure') }}" == "true" ]]; then
|
||||||
|
echo "One or more nightly test jobs failed"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if [[ "${{ contains(needs.*.result, 'cancelled') }}" == "true" ]]; then
|
||||||
|
echo "One or more nightly test jobs were cancelled"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "All nightly test jobs passed"
|
||||||
28
third_party/sglang/.github/workflows/open-pr-copy-from-oss.yml
vendored
Normal file
28
third_party/sglang/.github/workflows/open-pr-copy-from-oss.yml
vendored
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
name: Open A PR to Copy Code From OSS
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
# schedule:
|
||||||
|
# - cron: '0 10 * * *'
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
copy:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: 'main'
|
||||||
|
|
||||||
|
- name: Install GitHub CLI (if not present)
|
||||||
|
run: |
|
||||||
|
bash scripts/code_sync/install_github_cli.sh
|
||||||
|
|
||||||
|
- name: Copy from OSS code
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_OPEN_PR_TO_PRIVATE }}
|
||||||
|
run: |
|
||||||
|
python3 scripts/code_sync/copy_from_oss.py
|
||||||
31
third_party/sglang/.github/workflows/open-pr-copy-to-oss.yml
vendored
Normal file
31
third_party/sglang/.github/workflows/open-pr-copy-to-oss.yml
vendored
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
name: Open A PR to Copy Diff To OSS
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
commit_sha:
|
||||||
|
description: 'The commit SHA to copy. Defaults to LAST to copy the latest commit.'
|
||||||
|
required: false
|
||||||
|
default: 'LAST'
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
copy:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0
|
||||||
|
|
||||||
|
- name: Install GitHub CLI (if not present)
|
||||||
|
run: |
|
||||||
|
bash scripts/code_sync/install_github_cli.sh
|
||||||
|
|
||||||
|
- name: Copy to OSS code
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GH_PAT_FOR_OPEN_PR_TO_OSS }}
|
||||||
|
run: |
|
||||||
|
python3 scripts/code_sync/copy_to_oss.py --commit ${{ github.event.inputs.commit_sha }}
|
||||||
115
third_party/sglang/.github/workflows/patch-docker-dev.yml
vendored
Normal file
115
third_party/sglang/.github/workflows/patch-docker-dev.yml
vendored
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
name: Patch Docker Image
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
pr_numbers:
|
||||||
|
description: "Comma-separated PR numbers to apply (e.g. 18962,19010)"
|
||||||
|
required: false
|
||||||
|
default: ""
|
||||||
|
image_tag:
|
||||||
|
description: "Base image tag to patch (e.g. dev-x86, dev-x86-cu13)"
|
||||||
|
required: true
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: patch-docker-${{ inputs.image_tag }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
patch:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: x64-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Pull base image and extract commit
|
||||||
|
run: |
|
||||||
|
IMAGE="lmsysorg/sglang:${{ inputs.image_tag }}"
|
||||||
|
docker pull "${IMAGE}"
|
||||||
|
if BASE_SHA=$(docker run --rm "${IMAGE}" git -C /sgl-workspace/sglang rev-parse HEAD 2>/dev/null); then
|
||||||
|
echo "Image built from commit: ${BASE_SHA}"
|
||||||
|
else
|
||||||
|
BASE_SHA=""
|
||||||
|
echo "::warning::Image has no .git directory — cannot extract base commit"
|
||||||
|
fi
|
||||||
|
echo "BASE_SHA=${BASE_SHA}" >> "$GITHUB_ENV"
|
||||||
|
|
||||||
|
- name: Generate patches
|
||||||
|
run: |
|
||||||
|
git config --global --add safe.directory "$GITHUB_WORKSPACE"
|
||||||
|
git fetch origin main
|
||||||
|
mkdir -p /tmp/patch-ctx
|
||||||
|
|
||||||
|
if [ -n "${{ inputs.pr_numbers }}" ]; then
|
||||||
|
IFS=',' read -ra PRS <<< "${{ inputs.pr_numbers }}"
|
||||||
|
for pr in "${PRS[@]}"; do
|
||||||
|
pr=$(echo "${pr}" | xargs)
|
||||||
|
echo "Fetching PR #${pr}"
|
||||||
|
git fetch origin "pull/${pr}/head:pr-${pr}"
|
||||||
|
MERGE_BASE=$(git merge-base origin/main "pr-${pr}")
|
||||||
|
echo " PR #${pr}: merge-base=${MERGE_BASE}"
|
||||||
|
git diff "${MERGE_BASE}..pr-${pr}" > "/tmp/patch-ctx/${pr}.patch"
|
||||||
|
echo " PR #${pr}: $(wc -l < /tmp/patch-ctx/${pr}.patch) lines"
|
||||||
|
done
|
||||||
|
elif [ -n "${BASE_SHA}" ]; then
|
||||||
|
echo "Generating diff: image ${BASE_SHA} → latest main"
|
||||||
|
git fetch origin "${BASE_SHA}"
|
||||||
|
git diff "${BASE_SHA}..origin/main" > /tmp/patch-ctx/main.patch
|
||||||
|
echo " main: $(wc -l < /tmp/patch-ctx/main.patch) lines"
|
||||||
|
else
|
||||||
|
echo "::error::No PR numbers specified and image has no .git — cannot generate diff against main"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
TOTAL=$(cat /tmp/patch-ctx/*.patch | wc -l)
|
||||||
|
if [ "${TOTAL}" -eq 0 ]; then
|
||||||
|
echo "::warning::All patches are empty — image is already up to date"
|
||||||
|
echo "SKIP_BUILD=true" >> "$GITHUB_ENV"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Build patched image
|
||||||
|
if: env.SKIP_BUILD != 'true'
|
||||||
|
run: |
|
||||||
|
IMAGE="lmsysorg/sglang:${{ inputs.image_tag }}"
|
||||||
|
|
||||||
|
cat <<'DOCKERFILE' > /tmp/patch-ctx/Dockerfile
|
||||||
|
ARG BASE_IMAGE
|
||||||
|
FROM ${BASE_IMAGE}
|
||||||
|
COPY *.patch /tmp/patches/
|
||||||
|
RUN cd /sgl-workspace/sglang \
|
||||||
|
&& for p in /tmp/patches/*.patch; do \
|
||||||
|
if [ ! -s "${p}" ]; then \
|
||||||
|
echo "Skipping ${p} (empty)"; \
|
||||||
|
else \
|
||||||
|
echo "Applying ${p}..." \
|
||||||
|
&& patch -p1 --fuzz=2 --no-backup-if-mismatch -f < "${p}" \
|
||||||
|
|| { echo "ERROR: Failed to apply ${p}"; exit 1; }; \
|
||||||
|
fi; \
|
||||||
|
done \
|
||||||
|
&& rm -rf /tmp/patches
|
||||||
|
DOCKERFILE
|
||||||
|
|
||||||
|
docker build \
|
||||||
|
--no-cache \
|
||||||
|
--build-arg BASE_IMAGE="${IMAGE}" \
|
||||||
|
-t "${IMAGE}" \
|
||||||
|
/tmp/patch-ctx/
|
||||||
|
|
||||||
|
- name: Push patched image
|
||||||
|
if: env.SKIP_BUILD != 'true'
|
||||||
|
run: |
|
||||||
|
IMAGE="lmsysorg/sglang:${{ inputs.image_tag }}"
|
||||||
|
docker push "${IMAGE}"
|
||||||
|
|
||||||
|
echo "### Patched \`${IMAGE}\`" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
echo "- **Base commit:** \`${BASE_SHA:-unknown (no .git)}\`" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
echo "- **Source:** ${{ inputs.pr_numbers && format('PRs: {0}', inputs.pr_numbers) || 'latest main' }}" >> "$GITHUB_STEP_SUMMARY"
|
||||||
198
third_party/sglang/.github/workflows/pr-benchmark-rust.yml
vendored
Normal file
198
third_party/sglang/.github/workflows/pr-benchmark-rust.yml
vendored
Normal file
@@ -0,0 +1,198 @@
|
|||||||
|
name: PR Benchmark (SMG Components)
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
paths:
|
||||||
|
- "sgl-model-gateway/**"
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
paths:
|
||||||
|
- "sgl-model-gateway/**"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: pr-benchmark-rust-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
RUSTC_WRAPPER: sccache
|
||||||
|
SCCACHE_GHA_ENABLED: "true"
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
pull-requests: write
|
||||||
|
issues: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
benchmark-compile-check:
|
||||||
|
name: Benchmark Compilation Check
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_gateway_dependencies.sh
|
||||||
|
|
||||||
|
- name: Configure sccache
|
||||||
|
uses: mozilla-actions/sccache-action@v0.0.9
|
||||||
|
with:
|
||||||
|
version: "v0.12.0"
|
||||||
|
disable_annotations: true
|
||||||
|
|
||||||
|
- name: Rust cache
|
||||||
|
uses: Swatinem/rust-cache@v2
|
||||||
|
with:
|
||||||
|
workspaces: sgl-model-gateway
|
||||||
|
shared-key: "rust-cache"
|
||||||
|
save-if: true
|
||||||
|
cache-all-crates: true
|
||||||
|
cache-on-failure: true
|
||||||
|
|
||||||
|
- name: Check benchmarks compile
|
||||||
|
run: |
|
||||||
|
source "$HOME/.cargo/env"
|
||||||
|
cd sgl-model-gateway/
|
||||||
|
cargo check --benches
|
||||||
|
|
||||||
|
- name: Show sccache stats
|
||||||
|
if: always()
|
||||||
|
run: sccache --show-stats
|
||||||
|
|
||||||
|
benchmark:
|
||||||
|
name: Benchmark - ${{ matrix.name }}
|
||||||
|
if: |
|
||||||
|
github.repository == 'sgl-project/sglang' &&
|
||||||
|
(github.event_name == 'push' ||
|
||||||
|
github.event_name == 'workflow_dispatch' ||
|
||||||
|
(contains(github.event.pull_request.labels.*.name, 'router-benchmark') &&
|
||||||
|
contains(github.event.pull_request.labels.*.name, 'run-ci')))
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
include:
|
||||||
|
- name: Request Processing
|
||||||
|
bench_name: request_processing
|
||||||
|
bench_args: "benchmark_summary --exact"
|
||||||
|
runner: ubuntu-latest
|
||||||
|
sccache_version: "v0.12.0"
|
||||||
|
artifact_name: request-processing-results
|
||||||
|
artifact_path: criterion/benchmark_summary/
|
||||||
|
- name: Manual Policy
|
||||||
|
bench_name: manual_policy_benchmark
|
||||||
|
bench_args: ""
|
||||||
|
runner: ubuntu-latest
|
||||||
|
sccache_version: "v0.12.0"
|
||||||
|
artifact_name: manual-policy-results
|
||||||
|
artifact_path: criterion/manual_policy*/
|
||||||
|
runs-on: ${{ matrix.runner }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 100
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_gateway_dependencies.sh
|
||||||
|
|
||||||
|
- name: Configure sccache
|
||||||
|
uses: mozilla-actions/sccache-action@v0.0.9
|
||||||
|
with:
|
||||||
|
version: ${{ matrix.sccache_version }}
|
||||||
|
disable_annotations: true
|
||||||
|
|
||||||
|
- name: Rust cache
|
||||||
|
uses: Swatinem/rust-cache@v2
|
||||||
|
with:
|
||||||
|
workspaces: sgl-model-gateway
|
||||||
|
shared-key: "rust-cache"
|
||||||
|
cache-all-crates: true
|
||||||
|
cache-on-failure: true
|
||||||
|
save-if: true
|
||||||
|
|
||||||
|
- name: Run benchmark
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
source "$HOME/.cargo/env"
|
||||||
|
cd sgl-model-gateway/
|
||||||
|
if command -v sccache &> /dev/null; then
|
||||||
|
echo "Testing sccache availability..."
|
||||||
|
export RUSTC_WRAPPER=sccache
|
||||||
|
export SCCACHE_GHA_ENABLED="true"
|
||||||
|
if sccache --start-server 2>/dev/null && sccache --show-stats 2>/dev/null; then
|
||||||
|
echo "sccache is working, using it for compilation"
|
||||||
|
else
|
||||||
|
echo "sccache failed to start, falling back to regular cargo"
|
||||||
|
unset RUSTC_WRAPPER
|
||||||
|
unset SCCACHE_GHA_ENABLED
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "sccache not available, using regular cargo"
|
||||||
|
fi
|
||||||
|
cargo bench --bench ${{ matrix.bench_name }} -- ${{ matrix.bench_args }} 2>&1 | tee benchmark_output.txt
|
||||||
|
|
||||||
|
- name: Upload benchmark results
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: ${{ matrix.artifact_name }}-${{ github.sha }}
|
||||||
|
path: |
|
||||||
|
sgl-model-gateway/target/${{ matrix.artifact_path }}
|
||||||
|
sgl-model-gateway/benchmark_output.txt
|
||||||
|
retention-days: 30
|
||||||
|
|
||||||
|
- name: Show sccache stats
|
||||||
|
if: always()
|
||||||
|
run: sccache --show-stats
|
||||||
|
|
||||||
|
benchmark-summary:
|
||||||
|
name: Benchmark Summary
|
||||||
|
needs: [benchmark]
|
||||||
|
if: always() && (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request')
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Download all benchmark results
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
pattern: '*-results-${{ github.sha }}'
|
||||||
|
path: benchmark-results
|
||||||
|
|
||||||
|
- name: Generate summary
|
||||||
|
run: |
|
||||||
|
generate_section() {
|
||||||
|
local title="$1" dir_name="$2" lines="${3:-100}"
|
||||||
|
local dir="benchmark-results/${dir_name}-${{ github.sha }}"
|
||||||
|
echo "### $title" >> summary.md
|
||||||
|
if [ -d "$dir" ]; then
|
||||||
|
echo "✅ **Completed**" >> summary.md
|
||||||
|
if [ -f "$dir/benchmark_output.txt" ]; then
|
||||||
|
echo -e "\n<details>\n<summary>View Results</summary>\n\n\`\`\`" >> summary.md
|
||||||
|
tail -"$lines" "$dir/benchmark_output.txt" >> summary.md
|
||||||
|
echo -e "\`\`\`\n</details>" >> summary.md
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "❌ Failed or skipped" >> summary.md
|
||||||
|
fi
|
||||||
|
echo "" >> summary.md
|
||||||
|
}
|
||||||
|
|
||||||
|
echo "## 🚀 Benchmark Results Summary" > summary.md
|
||||||
|
echo "" >> summary.md
|
||||||
|
|
||||||
|
generate_section "Request Processing" "request-processing-results" 60
|
||||||
|
generate_section "Manual Policy (Sticky Sessions)" "manual-policy-results" 100
|
||||||
|
|
||||||
|
echo -e "---\n_Generated at $(date -u '+%Y-%m-%d %H:%M:%S UTC')_" >> summary.md
|
||||||
|
|
||||||
|
cat summary.md
|
||||||
|
cat summary.md >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
- name: Upload summary
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: benchmark-summary-${{ github.sha }}
|
||||||
|
path: summary.md
|
||||||
|
retention-days: 30
|
||||||
254
third_party/sglang/.github/workflows/pr-gate.yml
vendored
Normal file
254
third_party/sglang/.github/workflows/pr-gate.yml
vendored
Normal file
@@ -0,0 +1,254 @@
|
|||||||
|
on:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
require-run-ci:
|
||||||
|
description: "Whether the PR must have the run-ci label"
|
||||||
|
type: boolean
|
||||||
|
default: true
|
||||||
|
cool-down-minutes:
|
||||||
|
description: "Cooldown period in minutes for low-permission users; 0 disables rate limiting"
|
||||||
|
type: number
|
||||||
|
default: 120
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
pr-gate:
|
||||||
|
# 1. for commits on main: no gating needed
|
||||||
|
# 2. for workflow_dispatch: this can only be triggered by users with write access
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Fetch latest PR info
|
||||||
|
if: github.event_name == 'pull_request'
|
||||||
|
id: pr
|
||||||
|
uses: actions/github-script@v7
|
||||||
|
with:
|
||||||
|
github-token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
script: |
|
||||||
|
const pr = await github.rest.pulls.get({
|
||||||
|
owner: context.repo.owner,
|
||||||
|
repo: context.repo.repo,
|
||||||
|
pull_number: context.issue.number
|
||||||
|
});
|
||||||
|
core.setOutput("labels", JSON.stringify(pr.data.labels.map(l => l.name)));
|
||||||
|
core.setOutput("draft", pr.data.draft);
|
||||||
|
core.setOutput("user", pr.data.user.login);
|
||||||
|
|
||||||
|
- name: Log PR info
|
||||||
|
if: github.event_name == 'pull_request'
|
||||||
|
run: |
|
||||||
|
echo "===== PR Info ====="
|
||||||
|
echo "PR Event: ${{ github.event_name }}"
|
||||||
|
echo "PR Labels: ${{ steps.pr.outputs.labels }}"
|
||||||
|
echo "PR Draft: ${{ steps.pr.outputs.draft }}"
|
||||||
|
echo "PR User: ${{ steps.pr.outputs.user }}"
|
||||||
|
echo "Require run-ci: ${{ inputs.require-run-ci }}"
|
||||||
|
echo "Cool down minutes: ${{ inputs.cool-down-minutes }}"
|
||||||
|
echo "==================="
|
||||||
|
|
||||||
|
- name: Block draft PR
|
||||||
|
if: github.event_name == 'pull_request' && fromJson(steps.pr.outputs.draft)
|
||||||
|
run: |
|
||||||
|
echo "PR is draft. Blocking CI."
|
||||||
|
exit 1
|
||||||
|
|
||||||
|
- name: Require run-ci label (optional)
|
||||||
|
if: github.event_name == 'pull_request' && inputs.require-run-ci == true
|
||||||
|
run: |
|
||||||
|
labels='${{ steps.pr.outputs.labels }}'
|
||||||
|
if [[ "${{ contains(fromJson(steps.pr.outputs.labels), 'run-ci') }}" == "false" ]]; then
|
||||||
|
echo "Missing required label 'run-ci'. See https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests for more details."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Enforce rate limit for low-permission actors (optional)
|
||||||
|
if: github.event_name == 'pull_request' && inputs.cool-down-minutes > 0
|
||||||
|
uses: actions/github-script@v7
|
||||||
|
with:
|
||||||
|
github-token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
script: |
|
||||||
|
const DEFAULT_MINUTES = Number("${{ inputs.cool-down-minutes }}");
|
||||||
|
const owner = context.repo.owner;
|
||||||
|
const repo = context.repo.repo;
|
||||||
|
const eventName = context.eventName;
|
||||||
|
const curRun = await github.rest.actions.getWorkflowRun({
|
||||||
|
owner, repo, run_id: context.runId
|
||||||
|
});
|
||||||
|
let triggeringActor = curRun.data.triggering_actor?.login || context.actor;
|
||||||
|
if (triggeringActor === "github-actions[bot]") {
|
||||||
|
triggeringActor = `${{ steps.pr.outputs.user }}`;
|
||||||
|
core.info(
|
||||||
|
`triggering_actor is github-actions[bot]; substituting PR author '${triggeringActor}'.`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function hasHighPermission(username) {
|
||||||
|
try {
|
||||||
|
const { data } = await github.rest.repos.getCollaboratorPermissionLevel({ owner, repo, username });
|
||||||
|
const perm = data.permission || 'none';
|
||||||
|
return perm === 'write' || perm === 'maintain' || perm === 'admin';
|
||||||
|
} catch (e) {
|
||||||
|
if (e.status === 404 || e.status === 403) return false;
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (await hasHighPermission(triggeringActor)) {
|
||||||
|
core.info(`Triggering user '${triggeringActor}' has high permission. No rate limit applied.`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let effectiveCooldownMinutes = DEFAULT_MINUTES;
|
||||||
|
let perUserCooldownMinutes = null;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const contentResp = await github.rest.repos.getContent({
|
||||||
|
owner,
|
||||||
|
repo,
|
||||||
|
path: ".github/CI_PERMISSIONS.json",
|
||||||
|
ref: "main",
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!Array.isArray(contentResp.data) && contentResp.data && "content" in contentResp.data) {
|
||||||
|
const raw = Buffer.from(
|
||||||
|
contentResp.data.content,
|
||||||
|
contentResp.data.encoding || "base64"
|
||||||
|
).toString();
|
||||||
|
const ciPermissions = JSON.parse(raw);
|
||||||
|
|
||||||
|
const userPerm = ciPermissions[triggeringActor];
|
||||||
|
if (userPerm && typeof userPerm.cooldown_interval_minutes === "number") {
|
||||||
|
perUserCooldownMinutes = userPerm.cooldown_interval_minutes;
|
||||||
|
core.info(
|
||||||
|
`Per-user cooldown for '${triggeringActor}' from CI_PERMISSIONS.json: ${perUserCooldownMinutes} minutes.`
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
core.info(`No per-user cooldown found for '${triggeringActor}' in CI_PERMISSIONS.json.`);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
core.info("CI_PERMISSIONS.json content response is not a file; skipping per-user cooldown.");
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
core.info(`CI_PERMISSIONS.json not found or unreadable: ${e.message}. Using default rate limit only.`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (perUserCooldownMinutes !== null) {
|
||||||
|
effectiveCooldownMinutes = Math.min(effectiveCooldownMinutes, perUserCooldownMinutes);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (effectiveCooldownMinutes <= 0) {
|
||||||
|
core.info(
|
||||||
|
`Effective cooldown for '${triggeringActor}' is 0 minutes; no rate limit enforced for this user.`
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const cutoff = new Date(Date.now() - effectiveCooldownMinutes * 60 * 1000);
|
||||||
|
core.info(
|
||||||
|
`Checking for workflow runs since ${cutoff.toISOString()} (last ${effectiveCooldownMinutes} minutes) for event '${eventName}'.`
|
||||||
|
);
|
||||||
|
|
||||||
|
const { data } = await github.rest.actions.listWorkflowRuns({
|
||||||
|
owner,
|
||||||
|
repo,
|
||||||
|
workflow_id: 'pr-test.yml',
|
||||||
|
event: eventName,
|
||||||
|
per_page: 100,
|
||||||
|
});
|
||||||
|
|
||||||
|
const runs = data.workflow_runs || [];
|
||||||
|
|
||||||
|
// Rate Limiting Logic:
|
||||||
|
// We only count workflow runs that actually consumed CI resources (i.e., passed the gate).
|
||||||
|
// A run "passes the gate" if any jobs beyond the gate jobs (check-changes, pr-gate, call-gate)
|
||||||
|
// actually executed (not skipped/cancelled). This prevents scenarios where:
|
||||||
|
// - User has PR A with missing 'run-ci' label (fails at gate)
|
||||||
|
// - User opens PR B with 'run-ci' label
|
||||||
|
// - PR B should be able to run even though PR A triggered a run recently
|
||||||
|
|
||||||
|
// Helper function to check if a run passed the gate (i.e., actually consumed CI resources)
|
||||||
|
async function didRunPassGate(run) {
|
||||||
|
try {
|
||||||
|
// Note: Fetching up to 100 jobs (API maximum). If a workflow has >100 jobs,
|
||||||
|
// we may miss some, but this is unlikely in practice.
|
||||||
|
const { data: jobsData } = await github.rest.actions.listJobsForWorkflowRun({
|
||||||
|
owner, repo, run_id: run.id, per_page: 100
|
||||||
|
});
|
||||||
|
const jobs = jobsData.jobs || [];
|
||||||
|
|
||||||
|
// If no jobs exist yet, the run hasn't started consuming resources
|
||||||
|
if (jobs.length === 0) {
|
||||||
|
core.info(`Run ${run.id} has no jobs yet; not counting against rate limit.`);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gate jobs that don't consume significant CI resources
|
||||||
|
const gateJobs = ['check-changes', 'pr-gate', 'call-gate', 'pr-test-finish'];
|
||||||
|
const jobsBeyondGate = jobs.filter(j => !gateJobs.some(g => j.name === g || j.name.startsWith(g + ' ')));
|
||||||
|
|
||||||
|
// A job "ran" if it reached a terminal conclusion state that indicates actual execution
|
||||||
|
const ranStates = ['success', 'failure', 'timed_out', 'action_required'];
|
||||||
|
const hasJobsThatRan = jobsBeyondGate.some(j => j.conclusion && ranStates.includes(j.conclusion));
|
||||||
|
return hasJobsThatRan;
|
||||||
|
} catch (e) {
|
||||||
|
core.warning(`Could not check jobs for run ${run.id}: ${e.message}`);
|
||||||
|
|
||||||
|
// If it's a rate limit error, count it conservatively to prevent abuse
|
||||||
|
if (e.status === 429) {
|
||||||
|
core.warning(`Hit rate limit checking run ${run.id}; counting it to be safe.`);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// For cancelled/skipped runs, they likely didn't consume resources
|
||||||
|
if (run.conclusion === 'cancelled' || run.conclusion === 'skipped') {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default to counting it to prevent abuse
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Limit the number of runs we'll check in detail to avoid API rate limits
|
||||||
|
const MAX_RUNS_TO_CHECK = 5;
|
||||||
|
let runsChecked = 0;
|
||||||
|
let runsSkippedAtGate = 0;
|
||||||
|
let recentFound = null;
|
||||||
|
|
||||||
|
for (const run of runs) {
|
||||||
|
if (String(run.id) === String(context.runId)) continue;
|
||||||
|
if (new Date(run.created_at) < cutoff) continue;
|
||||||
|
const isUserRun = (run.actor?.login === triggeringActor) || (run.triggering_actor?.login === triggeringActor);
|
||||||
|
if (!isUserRun) continue;
|
||||||
|
|
||||||
|
runsChecked++;
|
||||||
|
core.info(`Checking run ${run.id} (created: ${run.created_at}, conclusion: ${run.conclusion})`);
|
||||||
|
|
||||||
|
// Safety limit: if we've checked too many runs, assume the next one passed to be conservative
|
||||||
|
if (runsChecked > MAX_RUNS_TO_CHECK) {
|
||||||
|
core.warning(`Checked ${MAX_RUNS_TO_CHECK} runs; assuming this one passed gate to avoid API limits.`);
|
||||||
|
recentFound = run;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Only count runs that actually passed the gate and consumed CI resources
|
||||||
|
if (await didRunPassGate(run)) {
|
||||||
|
recentFound = run;
|
||||||
|
core.info(`Found recent run ${run.id} that passed gate.`);
|
||||||
|
break;
|
||||||
|
} else {
|
||||||
|
runsSkippedAtGate++;
|
||||||
|
core.info(`Run ${run.id} failed at gate; not counting against rate limit.`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
core.info(`Rate limit check summary: checked ${runsChecked} runs, ${runsSkippedAtGate} failed at gate.`);
|
||||||
|
|
||||||
|
if (recentFound) {
|
||||||
|
core.setFailed(
|
||||||
|
`User '${triggeringActor}' already triggered '${context.workflow}' via '${eventName}' at ${recentFound.created_at}. ` +
|
||||||
|
`Please wait ${effectiveCooldownMinutes} minutes before triggering again.`
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
core.info(
|
||||||
|
`No recent runs detected for '${triggeringActor}' within the last ${effectiveCooldownMinutes} minutes; proceeding.`
|
||||||
|
);
|
||||||
|
}
|
||||||
1085
third_party/sglang/.github/workflows/pr-test-amd-rocm720.yml
vendored
Normal file
1085
third_party/sglang/.github/workflows/pr-test-amd-rocm720.yml
vendored
Normal file
File diff suppressed because it is too large
Load Diff
1090
third_party/sglang/.github/workflows/pr-test-amd.yml
vendored
Normal file
1090
third_party/sglang/.github/workflows/pr-test-amd.yml
vendored
Normal file
File diff suppressed because it is too large
Load Diff
117
third_party/sglang/.github/workflows/pr-test-jit-kernel.yml
vendored
Normal file
117
third_party/sglang/.github/workflows/pr-test-jit-kernel.yml
vendored
Normal file
@@ -0,0 +1,117 @@
|
|||||||
|
name: PR Test - JIT Kernel
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
jit_kernel:
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
pr_head_sha:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
git_ref:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
target_stage:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
test_parallel_dispatch:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
skip_stage_health_check:
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: false
|
||||||
|
|
||||||
|
# Workflow-level env is NOT inherited from the caller in reusable workflows (verified by CI test).
|
||||||
|
# The github context (including github.event_name) IS inherited from the caller.
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
SGLANG_CUDA_COREDUMP: "1"
|
||||||
|
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: true
|
||||||
|
SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN: ${{ github.ref == 'refs/heads/main' && 'true' || 'false' }}
|
||||||
|
SKIP_STAGE_HEALTH_CHECK: ${{ inputs.skip_stage_health_check == true && 'true' || 'false' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
jit-kernel-unit-test:
|
||||||
|
if: |
|
||||||
|
github.event_name != 'schedule' &&
|
||||||
|
inputs.test_parallel_dispatch != 'true' &&
|
||||||
|
!inputs.target_stage
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
cd test/
|
||||||
|
python3 run_suite.py --hw cuda --suite stage-b-kernel-unit-1-gpu-large
|
||||||
|
|
||||||
|
jit-kernel-multigpu-unit-test:
|
||||||
|
if: |
|
||||||
|
github.event_name != 'schedule' &&
|
||||||
|
inputs.test_parallel_dispatch != 'true' &&
|
||||||
|
!inputs.target_stage
|
||||||
|
runs-on: 8-gpu-h200
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run multi-GPU test
|
||||||
|
timeout-minutes: 45
|
||||||
|
run: |
|
||||||
|
cd test/
|
||||||
|
python3 run_suite.py --hw cuda --suite stage-b-kernel-unit-8-gpu-h200
|
||||||
|
|
||||||
|
jit-kernel-benchmark-test:
|
||||||
|
if: |
|
||||||
|
github.event_name != 'schedule' &&
|
||||||
|
inputs.test_parallel_dispatch != 'true' &&
|
||||||
|
!inputs.target_stage
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run benchmark tests
|
||||||
|
timeout-minutes: 45
|
||||||
|
run: |
|
||||||
|
cd test/
|
||||||
|
python3 run_suite.py --hw cuda --suite stage-b-kernel-benchmark-1-gpu-large
|
||||||
245
third_party/sglang/.github/workflows/pr-test-multimodal-gen.yml
vendored
Normal file
245
third_party/sglang/.github/workflows/pr-test-multimodal-gen.yml
vendored
Normal file
@@ -0,0 +1,245 @@
|
|||||||
|
name: PR Test - Multimodal Gen
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
multimodal_gen:
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
sgl_kernel:
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
b200_runner:
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
continue_on_error:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
pr_head_sha:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
git_ref:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
target_stage:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
test_parallel_dispatch:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
caller_needs_failure:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
skip_stage_health_check:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: 'false'
|
||||||
|
|
||||||
|
# Workflow-level env is NOT inherited from the caller in reusable workflows.
|
||||||
|
# The github context (including github.event_name) IS inherited from the caller.
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
SGLANG_CUDA_COREDUMP: "1"
|
||||||
|
SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN: ${{ github.ref == 'refs/heads/main' && 'true' || 'false' }}
|
||||||
|
SKIP_STAGE_HEALTH_CHECK: ${{ inputs.skip_stage_health_check == 'true' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
multimodal-gen-test-1-gpu:
|
||||||
|
if: |
|
||||||
|
(inputs.target_stage == 'multimodal-gen-test-1-gpu') ||
|
||||||
|
(
|
||||||
|
!inputs.target_stage &&
|
||||||
|
((github.event_name == 'schedule' || inputs.test_parallel_dispatch == 'true') || (inputs.caller_needs_failure != 'true' && !cancelled())) &&
|
||||||
|
inputs.multimodal_gen == 'true'
|
||||||
|
)
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
if: inputs.sgl_kernel == 'true'
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
- name: Run diffusion server tests
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
RUNAI_STREAMER_MEMORY_LIMIT: 0
|
||||||
|
CONTINUE_ON_ERROR_FLAG: ${{ inputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py \
|
||||||
|
--suite 1-gpu \
|
||||||
|
--partition-id ${{ matrix.part }} \
|
||||||
|
--total-partitions 2 \
|
||||||
|
$CONTINUE_ON_ERROR_FLAG
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
with:
|
||||||
|
artifact-suffix: ${{ matrix.part }}
|
||||||
|
|
||||||
|
multimodal-gen-test-2-gpu:
|
||||||
|
if: |
|
||||||
|
(inputs.target_stage == 'multimodal-gen-test-2-gpu') ||
|
||||||
|
(
|
||||||
|
!inputs.target_stage &&
|
||||||
|
((github.event_name == 'schedule' || inputs.test_parallel_dispatch == 'true') || (inputs.caller_needs_failure != 'true' && !cancelled())) &&
|
||||||
|
inputs.multimodal_gen == 'true'
|
||||||
|
)
|
||||||
|
runs-on: 2-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
if: inputs.sgl_kernel == 'true'
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run diffusion server tests
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
RUNAI_STREAMER_MEMORY_LIMIT: 0
|
||||||
|
CONTINUE_ON_ERROR_FLAG: ${{ inputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py \
|
||||||
|
--suite 2-gpu \
|
||||||
|
--partition-id ${{ matrix.part }} \
|
||||||
|
--total-partitions 2 \
|
||||||
|
$CONTINUE_ON_ERROR_FLAG
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
with:
|
||||||
|
artifact-suffix: ${{ matrix.part }}
|
||||||
|
|
||||||
|
multimodal-gen-test-1-b200:
|
||||||
|
if: |
|
||||||
|
(inputs.target_stage == 'multimodal-gen-test-1-b200') ||
|
||||||
|
(
|
||||||
|
!inputs.target_stage &&
|
||||||
|
((github.event_name == 'schedule' || inputs.test_parallel_dispatch == 'true') || (inputs.caller_needs_failure != 'true' && !cancelled())) &&
|
||||||
|
inputs.multimodal_gen == 'true'
|
||||||
|
)
|
||||||
|
runs-on: ${{ inputs.b200_runner }}
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
if: inputs.sgl_kernel == 'true'
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run diffusion server tests
|
||||||
|
timeout-minutes: 240
|
||||||
|
env:
|
||||||
|
RUNAI_STREAMER_MEMORY_LIMIT: 0
|
||||||
|
CONTINUE_ON_ERROR_FLAG: ${{ inputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py \
|
||||||
|
--suite 1-gpu-b200 \
|
||||||
|
$CONTINUE_ON_ERROR_FLAG
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
multimodal-gen-unit-test:
|
||||||
|
if: |
|
||||||
|
(inputs.target_stage == 'multimodal-gen-unit-test') ||
|
||||||
|
(
|
||||||
|
!inputs.target_stage &&
|
||||||
|
((github.event_name == 'schedule' || inputs.test_parallel_dispatch == 'true') || (inputs.caller_needs_failure != 'true' && !cancelled())) &&
|
||||||
|
inputs.multimodal_gen == 'true'
|
||||||
|
)
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 120
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
if: inputs.sgl_kernel == 'true'
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run diffusion unit tests
|
||||||
|
timeout-minutes: 60
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py --suite unit
|
||||||
453
third_party/sglang/.github/workflows/pr-test-npu.yml
vendored
Normal file
453
third_party/sglang/.github/workflows/pr-test-npu.yml
vendored
Normal file
@@ -0,0 +1,453 @@
|
|||||||
|
name: PR Test (NPU)
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
workflow_dispatch:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref (branch, tag, or SHA) to test. If not provided, uses the default branch.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
run_all_tests:
|
||||||
|
description: "Run all tests (for releasing or testing purpose)"
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: false
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: pr-test-npu-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: ${{ github.event_name != 'workflow_call' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
# ==================== Check Changes ==================== #
|
||||||
|
check-changes:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
changes_exist: ${{ steps.filter.outputs.main_package == 'true' || steps.filter.outputs.multimodal_gen == 'true' || steps.run-mode.outputs.run_all_tests == 'true'}}
|
||||||
|
main_package: ${{ steps.filter.outputs.main_package == 'true' || steps.run-mode.outputs.run_all_tests == 'true' }}
|
||||||
|
multimodal_gen: ${{ steps.filter.outputs.multimodal_gen == 'true' || steps.run-mode.outputs.run_all_tests == 'true' }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Determine run mode
|
||||||
|
id: run-mode
|
||||||
|
run: |
|
||||||
|
# Run all tests for workflow_call (when ref input is provided)
|
||||||
|
# Note: github.event_name is inherited from caller, so we detect workflow_call by checking inputs.ref
|
||||||
|
if [[ "${{ inputs.run_all_tests }}" == "true" ]]; then
|
||||||
|
echo "run_all_tests=true" >> $GITHUB_OUTPUT
|
||||||
|
echo "Run mode: ALL TESTS (run_all_tests=${{ inputs.run_all_tests }})"
|
||||||
|
else
|
||||||
|
echo "run_all_tests=false" >> $GITHUB_OUTPUT
|
||||||
|
echo "Run mode: FILTERED (triggered by ${{ github.event_name }})"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Detect file changes
|
||||||
|
id: filter
|
||||||
|
uses: dorny/paths-filter@v3
|
||||||
|
if: steps.run-mode.outputs.run_all_tests != 'true'
|
||||||
|
with:
|
||||||
|
filters: |
|
||||||
|
main_package:
|
||||||
|
- "python/sglang/!(multimodal_gen)/**/!(*.md)"
|
||||||
|
- "python/pyproject_npu.toml"
|
||||||
|
- "scripts/ci/npu/npu_ci_install_dependency.sh"
|
||||||
|
- "test/srt/ascend/**"
|
||||||
|
- ".github/workflows/pr-test-npu.yml"
|
||||||
|
multimodal_gen:
|
||||||
|
- "python/sglang/multimodal_gen/**/*.!(md|ipynb)"
|
||||||
|
- "python/sglang/srt/**"
|
||||||
|
- "python/pyproject_npu.toml"
|
||||||
|
- "scripts/ci/npu/npu_ci_install_dependency.sh"
|
||||||
|
- ".github/workflows/pr-test-npu.yml"
|
||||||
|
|
||||||
|
# ==================== PR Gate ==================== #
|
||||||
|
pr-gate:
|
||||||
|
needs: check-changes
|
||||||
|
if: needs.check-changes.outputs.changes_exist == 'true'
|
||||||
|
uses: ./.github/workflows/pr-gate.yml
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
stage-b-test-1-npu-a2:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
runs-on: linux-aarch64-a2-1
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
part: [ 0, 1 ]
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-910b-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh 910b
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite stage-b-test-1-npu-a2 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2
|
||||||
|
|
||||||
|
stage-b-test-2-npu-a2:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
runs-on: linux-aarch64-a2-2
|
||||||
|
strategy:
|
||||||
|
fail-fast: true
|
||||||
|
matrix:
|
||||||
|
part: [0, 1]
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-910b-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh 910b
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite stage-b-test-2-npu-a2 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2
|
||||||
|
|
||||||
|
stage-b-test-4-npu-a3:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
runs-on: linux-aarch64-a3-4
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite stage-b-test-4-npu-a3 --timeout-per-file 3600
|
||||||
|
|
||||||
|
|
||||||
|
stage-b-test-16-npu-a3:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
runs-on: linux-aarch64-a3-16
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw npu --suite stage-b-test-16-npu-a3 --timeout-per-file 3600
|
||||||
|
|
||||||
|
multimodal-gen-test-1-npu-a3:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.multimodal_gen == 'true'
|
||||||
|
runs-on: linux-aarch64-a3-2
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-a3-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3 diffusion
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
export PATH="/usr/local/Ascend/8.3.RC1/compiler/bishengir/bin:${PATH}"
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py --suite 1-npu
|
||||||
|
|
||||||
|
multimodal-gen-test-2-npu-a3:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.multimodal_gen == 'true'
|
||||||
|
runs-on: linux-aarch64-a3-16
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-a3-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3 diffusion
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
export PATH="/usr/local/Ascend/8.3.RC1/compiler/bishengir/bin:${PATH}"
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py --suite 2-npu
|
||||||
|
|
||||||
|
multimodal-gen-test-8-npu-a3:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.multimodal_gen == 'true'
|
||||||
|
runs-on: linux-aarch64-a3-8
|
||||||
|
container:
|
||||||
|
image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.0-a3-ubuntu22.04-py3.11
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Mark repository safe
|
||||||
|
run: |
|
||||||
|
git config --system --add safe.directory ${GITHUB_WORKSPACE}
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
env:
|
||||||
|
TORCH_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/whl/cpu"
|
||||||
|
PYPI_CACHE_URL: "http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple"
|
||||||
|
GITHUB_PROXY_URL: "https://gh-proxy.test.osinfra.cn/"
|
||||||
|
run: |
|
||||||
|
# speed up by using infra cache services
|
||||||
|
CACHING_URL="cache-service.nginx-pypi-cache.svc.cluster.local"
|
||||||
|
sed -Ei "s@(ports|archive).ubuntu.com@${CACHING_URL}:8081@g" /etc/apt/sources.list
|
||||||
|
pip config set global.index-url http://${CACHING_URL}/pypi/simple
|
||||||
|
pip config set global.trusted-host "${CACHING_URL}"
|
||||||
|
|
||||||
|
bash scripts/ci/npu/npu_ci_install_dependency.sh a3 diffusion
|
||||||
|
# copy required file from our daily cache
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/otavia/ShareGPT_Vicuna_unfiltered/ShareGPT_V3_unfiltered_cleaned_split.json /tmp
|
||||||
|
# copy gsm8k dataset
|
||||||
|
cp ~/.cache/modelscope/hub/datasets/tmp/test.jsonl /tmp
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
env:
|
||||||
|
SGLANG_USE_MODELSCOPE: true
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_ENDPOINT: https://hf-mirror.com
|
||||||
|
TORCH_EXTENSIONS_DIR: /tmp/torch_extensions
|
||||||
|
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
|
||||||
|
STREAMS_PER_DEVICE: 32
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
python3 sglang/multimodal_gen/test/run_suite.py --suite 8-npu
|
||||||
|
|
||||||
|
pr-test-npu-finish:
|
||||||
|
needs:
|
||||||
|
[
|
||||||
|
check-changes,
|
||||||
|
|
||||||
|
stage-b-test-1-npu-a2,
|
||||||
|
stage-b-test-2-npu-a2,
|
||||||
|
stage-b-test-4-npu-a3,
|
||||||
|
stage-b-test-16-npu-a3,
|
||||||
|
|
||||||
|
multimodal-gen-test-1-npu-a3,
|
||||||
|
multimodal-gen-test-2-npu-a3,
|
||||||
|
multimodal-gen-test-8-npu-a3,
|
||||||
|
]
|
||||||
|
if: always()
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Check all dependent job statuses
|
||||||
|
run: |
|
||||||
|
# Convert the 'needs' context to a JSON string
|
||||||
|
json_needs='${{ toJson(needs) }}'
|
||||||
|
|
||||||
|
# Get a list of all job names from the JSON keys
|
||||||
|
job_names=$(echo "$json_needs" | jq -r 'keys_unsorted[]')
|
||||||
|
|
||||||
|
for job in $job_names; do
|
||||||
|
# For each job, extract its result
|
||||||
|
result=$(echo "$json_needs" | jq -r --arg j "$job" '.[$j].result')
|
||||||
|
|
||||||
|
# Print the job name and its result
|
||||||
|
echo "$job: $result"
|
||||||
|
|
||||||
|
# Check for failure or cancellation and exit if found
|
||||||
|
if [[ "$result" == "failure" || "$result" == "cancelled" ]]; then
|
||||||
|
echo "The above jobs failed."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
# If the loop completes, all jobs were successful
|
||||||
|
echo "All jobs completed successfully"
|
||||||
|
exit 0
|
||||||
359
third_party/sglang/.github/workflows/pr-test-rust.yml
vendored
Normal file
359
third_party/sglang/.github/workflows/pr-test-rust.yml
vendored
Normal file
@@ -0,0 +1,359 @@
|
|||||||
|
name: PR Test (SMG)
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
paths:
|
||||||
|
- "sgl-model-gateway/**"
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
types: [opened, synchronize, reopened, labeled]
|
||||||
|
paths:
|
||||||
|
- "sgl-model-gateway/**"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: gateway-tests-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
RUSTC_WRAPPER: sccache
|
||||||
|
SCCACHE_GHA_ENABLED: "true"
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-wheel:
|
||||||
|
if: |
|
||||||
|
github.event_name != 'pull_request' ||
|
||||||
|
(github.event.action != 'labeled' && contains(github.event.pull_request.labels.*.name, 'run-ci')) ||
|
||||||
|
(github.event.action == 'labeled' && github.event.label.name == 'run-ci')
|
||||||
|
runs-on: 4-gpu-a10
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install rust dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_gateway_dependencies.sh
|
||||||
|
|
||||||
|
- name: Configure sccache
|
||||||
|
uses: mozilla-actions/sccache-action@v0.0.9
|
||||||
|
with:
|
||||||
|
version: "v0.12.0"
|
||||||
|
disable_annotations: true
|
||||||
|
|
||||||
|
- name: Rust cache
|
||||||
|
uses: Swatinem/rust-cache@v2
|
||||||
|
with:
|
||||||
|
workspaces: sgl-model-gateway
|
||||||
|
shared-key: "rust-cache"
|
||||||
|
cache-all-crates: true
|
||||||
|
cache-on-failure: true
|
||||||
|
save-if: true
|
||||||
|
|
||||||
|
- name: Build python binding
|
||||||
|
run: |
|
||||||
|
source "$HOME/.cargo/env"
|
||||||
|
export RUSTC_WRAPPER=sccache
|
||||||
|
cd sgl-model-gateway/bindings/python
|
||||||
|
python3 -m pip install --upgrade pip maturin
|
||||||
|
maturin build --profile ci --features vendored-openssl --out dist
|
||||||
|
|
||||||
|
- name: List built wheel
|
||||||
|
run: ls -lh sgl-model-gateway/bindings/python/dist/
|
||||||
|
|
||||||
|
- name: Upload wheel artifact
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: smg-wheel
|
||||||
|
path: sgl-model-gateway/bindings/python/dist/*.whl
|
||||||
|
retention-days: 1
|
||||||
|
|
||||||
|
- name: Test wheel install
|
||||||
|
run: |
|
||||||
|
pip install sgl-model-gateway/bindings/python/dist/*.whl
|
||||||
|
python3 -c "import sglang_router; print('Python package: OK')"
|
||||||
|
python3 -c "from sglang_router.sglang_router_rs import Router; print('Rust extension: OK')"
|
||||||
|
python3 -m sglang_router.launch_router --help > /dev/null && echo "Entry point: OK"
|
||||||
|
|
||||||
|
python-unit-tests:
|
||||||
|
needs: build-wheel
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
path: sglang-repo
|
||||||
|
|
||||||
|
- name: Move sgl-model-gateway folder to root
|
||||||
|
run: |
|
||||||
|
mv sglang-repo/sgl-model-gateway/* .
|
||||||
|
rm -rf sglang-repo
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.13"
|
||||||
|
|
||||||
|
- name: Download wheel artifact
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: smg-wheel
|
||||||
|
path: dist/
|
||||||
|
|
||||||
|
- name: Install wheel
|
||||||
|
run: pip install dist/*.whl
|
||||||
|
|
||||||
|
- name: Run Python unit tests
|
||||||
|
run: |
|
||||||
|
cd bindings/python
|
||||||
|
python3 -m pip install pytest pytest-cov pytest-xdist
|
||||||
|
pytest -q tests --cov=sglang_router --cov-config=.coveragerc --cov-report=term-missing --cov-fail-under=80
|
||||||
|
|
||||||
|
unit-tests:
|
||||||
|
if: |
|
||||||
|
github.event_name != 'pull_request' ||
|
||||||
|
(github.event.action != 'labeled' && contains(github.event.pull_request.labels.*.name, 'run-ci')) ||
|
||||||
|
(github.event.action == 'labeled' && github.event.label.name == 'run-ci')
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_gateway_dependencies.sh
|
||||||
|
|
||||||
|
- name: Configure sccache
|
||||||
|
uses: mozilla-actions/sccache-action@v0.0.9
|
||||||
|
with:
|
||||||
|
version: "v0.12.0"
|
||||||
|
disable_annotations: true
|
||||||
|
|
||||||
|
- name: Rust cache
|
||||||
|
uses: Swatinem/rust-cache@v2
|
||||||
|
with:
|
||||||
|
workspaces: sgl-model-gateway
|
||||||
|
shared-key: "rust-cache"
|
||||||
|
cache-all-crates: true
|
||||||
|
cache-on-failure: true
|
||||||
|
save-if: true
|
||||||
|
|
||||||
|
- name: Run lint
|
||||||
|
run: |
|
||||||
|
source "$HOME/.cargo/env"
|
||||||
|
cd sgl-model-gateway/
|
||||||
|
rustup component add clippy
|
||||||
|
cargo clippy --all-targets --all-features -- -D warnings
|
||||||
|
|
||||||
|
- name: Run fmt
|
||||||
|
run: |
|
||||||
|
source "$HOME/.cargo/env"
|
||||||
|
cd sgl-model-gateway/
|
||||||
|
rustup component add --toolchain nightly-x86_64-unknown-linux-gnu rustfmt
|
||||||
|
rustup toolchain install nightly --profile minimal
|
||||||
|
cargo +nightly fmt -- --check
|
||||||
|
|
||||||
|
- name: Generate vision golden fixtures
|
||||||
|
run: |
|
||||||
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
|
||||||
|
|
||||||
|
pip install transformers pillow numpy scipy
|
||||||
|
pip install transformers pillow numpy
|
||||||
|
cd sgl-model-gateway/
|
||||||
|
python scripts/generate_vision_golden.py
|
||||||
|
|
||||||
|
- name: Run Rust tests
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
source "$HOME/.cargo/env"
|
||||||
|
cd sgl-model-gateway/
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
- name: Show sccache stats
|
||||||
|
if: always()
|
||||||
|
run: sccache --show-stats
|
||||||
|
|
||||||
|
gateway-e2e:
|
||||||
|
name: ${{ matrix.name }}
|
||||||
|
needs: build-wheel
|
||||||
|
if: |
|
||||||
|
github.event_name != 'pull_request' ||
|
||||||
|
(github.event.action != 'labeled' && contains(github.event.pull_request.labels.*.name, 'run-ci')) ||
|
||||||
|
(github.event.action == 'labeled' && github.event.label.name == 'run-ci')
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
include:
|
||||||
|
- name: benchmarks
|
||||||
|
timeout: 32
|
||||||
|
test_dirs: "e2e_test/benchmarks"
|
||||||
|
extra_deps: "genai-bench==0.0.3"
|
||||||
|
env_vars: ""
|
||||||
|
reruns: ""
|
||||||
|
upload_benchmarks: true
|
||||||
|
parallel_opts: "" # No parallel for benchmarks (performance measurement)
|
||||||
|
- name: responses
|
||||||
|
timeout: 45
|
||||||
|
test_dirs: "e2e_test/responses"
|
||||||
|
extra_deps: ""
|
||||||
|
env_vars: "SHOW_WORKER_LOGS=0 SHOW_ROUTER_LOGS=1"
|
||||||
|
reruns: "--reruns 2 --reruns-delay 5"
|
||||||
|
setup_oracle: true
|
||||||
|
setup_brave: true
|
||||||
|
parallel_opts: "" # Cloud backend tests not compatible with parallel execution
|
||||||
|
- name: e2e
|
||||||
|
timeout: 45
|
||||||
|
test_dirs: "e2e_test/router e2e_test/embeddings"
|
||||||
|
extra_deps: "pytest-parallel py" # py is required for pytest-parallel with newer pytest
|
||||||
|
env_vars: "SHOW_WORKER_LOGS=0 SHOW_ROUTER_LOGS=1"
|
||||||
|
reruns: "--reruns 2 --reruns-delay 5"
|
||||||
|
parallel_opts: "--workers 1 --tests-per-worker 4" # Thread-based parallelism
|
||||||
|
- name: chat-completions
|
||||||
|
timeout: 45
|
||||||
|
test_dirs: "e2e_test/chat_completions"
|
||||||
|
extra_deps: ""
|
||||||
|
env_vars: "SHOW_WORKER_LOGS=0 SHOW_ROUTER_LOGS=1"
|
||||||
|
reruns: "--reruns 2 --reruns-delay 5"
|
||||||
|
parallel_opts: ""
|
||||||
|
runs-on: 4-gpu-a10
|
||||||
|
timeout-minutes: ${{ matrix.timeout }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install SGLang dependencies
|
||||||
|
run: |
|
||||||
|
sudo --preserve-env=PATH bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Setup Oracle Instant Client
|
||||||
|
if: matrix.setup_oracle
|
||||||
|
run: |
|
||||||
|
sudo apt-get install -y unzip
|
||||||
|
INSTANT_CLIENT_DIR="/home/ubuntu/instant-client"
|
||||||
|
INSTANT_CLIENT_ZIP="instantclient-basic-linux.x64-23.9.0.25.07.zip"
|
||||||
|
|
||||||
|
if [ ! -d "$INSTANT_CLIENT_DIR/instantclient_23_9" ]; then
|
||||||
|
echo "Downloading Oracle Instant Client..."
|
||||||
|
mkdir -p "$INSTANT_CLIENT_DIR"
|
||||||
|
cd "$INSTANT_CLIENT_DIR"
|
||||||
|
wget https://download.oracle.com/otn_software/linux/instantclient/2390000/$INSTANT_CLIENT_ZIP
|
||||||
|
unzip $INSTANT_CLIENT_ZIP
|
||||||
|
rm $INSTANT_CLIENT_ZIP
|
||||||
|
else
|
||||||
|
echo "Oracle Instant Client already exists, skipping download"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "LD_LIBRARY_PATH=/home/ubuntu/instant-client/instantclient_23_9:\$LD_LIBRARY_PATH" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Start Oracle Database
|
||||||
|
if: matrix.setup_oracle
|
||||||
|
run: |
|
||||||
|
docker run -d -p 1521:1521 -e ORACLE_PASSWORD=oracle --name oracle-db gvenzl/oracle-xe:21-slim
|
||||||
|
echo "Starting Oracle DB..."
|
||||||
|
|
||||||
|
# Export Oracle connection environment variables
|
||||||
|
echo "ATP_USER=system" >> $GITHUB_ENV
|
||||||
|
echo "ATP_PASSWORD=oracle" >> $GITHUB_ENV
|
||||||
|
echo "ATP_DSN=localhost:1521/XEPDB1" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Start Brave MCP Server
|
||||||
|
if: matrix.setup_brave
|
||||||
|
run: |
|
||||||
|
docker run -d --rm \
|
||||||
|
-p 8001:8080 \
|
||||||
|
-e BRAVE_API_KEY \
|
||||||
|
--name brave-search-server \
|
||||||
|
shoofio/brave-search-mcp-sse:1.0.10
|
||||||
|
echo "Starting Brave MCP Server..."
|
||||||
|
sleep 2
|
||||||
|
curl -f --max-time 1 http://localhost:8001/sse > /dev/null 2>&1 && echo "Brave MCP Server is healthy!" || echo "Brave MCP Server responded"
|
||||||
|
|
||||||
|
- name: Download wheel artifact
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: smg-wheel
|
||||||
|
path: wheel/
|
||||||
|
|
||||||
|
- name: Install wheel
|
||||||
|
run: |
|
||||||
|
pip uninstall -y sglang-router || true
|
||||||
|
pip install wheel/*.whl
|
||||||
|
|
||||||
|
- name: Install e2e test dependencies
|
||||||
|
run: |
|
||||||
|
python3 -m pip install pytest pytest-rerunfailures httpx openai grpcio grpcio-health-checking numpy
|
||||||
|
if [ -n "${{ matrix.extra_deps }}" ]; then
|
||||||
|
python3 -m pip --no-cache-dir install --upgrade ${{ matrix.extra_deps }}
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Run E2E tests
|
||||||
|
run: |
|
||||||
|
python3 python/sglang/cli/killall.py
|
||||||
|
cd sgl-model-gateway
|
||||||
|
${{ matrix.env_vars }} ROUTER_LOCAL_MODEL_PATH="/home/ubuntu/models" pytest ${{ matrix.reruns }} ${{ matrix.parallel_opts }} ${{ matrix.test_dirs }} -s -vv -o log_cli=true --log-cli-level=INFO
|
||||||
|
|
||||||
|
- name: Upload benchmark results
|
||||||
|
if: matrix.upload_benchmarks && success()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: genai-bench-results-all-policies
|
||||||
|
path: sgl-model-gateway/benchmark_**/
|
||||||
|
|
||||||
|
- name: Cleanup Brave MCP Server
|
||||||
|
if: always() && matrix.setup_brave
|
||||||
|
run: |
|
||||||
|
docker stop brave-search-server || true
|
||||||
|
docker rm brave-search-server || true
|
||||||
|
|
||||||
|
- name: Cleanup Oracle Database
|
||||||
|
if: always() && matrix.setup_oracle
|
||||||
|
run: |
|
||||||
|
docker stop oracle-db || true
|
||||||
|
docker rm oracle-db || true
|
||||||
|
|
||||||
|
docker-build-test:
|
||||||
|
if: |
|
||||||
|
github.event_name != 'pull_request' ||
|
||||||
|
(github.event.action != 'labeled' && contains(github.event.pull_request.labels.*.name, 'run-ci')) ||
|
||||||
|
(github.event.action == 'labeled' && github.event.label.name == 'run-ci')
|
||||||
|
runs-on: ubuntu-24.04
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Build Docker image (no push)
|
||||||
|
uses: docker/build-push-action@v5
|
||||||
|
with:
|
||||||
|
context: .
|
||||||
|
file: docker/gateway.Dockerfile
|
||||||
|
push: false
|
||||||
|
tags: sgl-model-gateway:test
|
||||||
|
cache-from: type=gha
|
||||||
|
cache-to: type=gha,mode=max
|
||||||
|
|
||||||
|
finish:
|
||||||
|
needs: [build-wheel, python-unit-tests, unit-tests, gateway-e2e, docker-build-test]
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Finish
|
||||||
|
run: echo "This is an empty step to ensure that all jobs are completed."
|
||||||
|
|
||||||
|
summarize-benchmarks:
|
||||||
|
needs: gateway-e2e
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
if: success()
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Download benchmark results
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: genai-bench-results-all-policies
|
||||||
|
|
||||||
|
- name: Create benchmark summary
|
||||||
|
run: python3 sgl-model-gateway/e2e_test/benchmarks/summarize.py .
|
||||||
214
third_party/sglang/.github/workflows/pr-test-sgl-kernel.yml
vendored
Normal file
214
third_party/sglang/.github/workflows/pr-test-sgl-kernel.yml
vendored
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
name: PR Test - SGL Kernel
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
sgl_kernel:
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
b200_runner:
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
pr_head_sha:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
git_ref:
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
skip_stage_health_check:
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: false
|
||||||
|
|
||||||
|
# Workflow-level env is NOT inherited from the caller in reusable workflows.
|
||||||
|
# The github context (including github.event_name) IS inherited from the caller.
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
SGLANG_CUDA_COREDUMP: "1"
|
||||||
|
SGLANG_PR_TEST_BYPASS_MAINTENANCE_ON_MAIN: ${{ github.ref == 'refs/heads/main' && 'true' || 'false' }}
|
||||||
|
SKIP_STAGE_HEALTH_CHECK: ${{ inputs.skip_stage_health_check == true && 'true' || 'false' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
sgl-kernel-unit-test:
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Cleanup
|
||||||
|
run: |
|
||||||
|
ls -alh sgl-kernel/dist || true
|
||||||
|
rm -rf sgl-kernel/dist/* || true
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
cd sgl-kernel
|
||||||
|
pytest tests/
|
||||||
|
|
||||||
|
sgl-kernel-mla-test:
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Cleanup
|
||||||
|
run: |
|
||||||
|
ls -alh sgl-kernel/dist || true
|
||||||
|
rm -rf sgl-kernel/dist/* || true
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
cd test/registered/mla
|
||||||
|
python3 test_mla_deepseek_v3.py
|
||||||
|
|
||||||
|
sgl-kernel-benchmark-test:
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Cleanup
|
||||||
|
run: |
|
||||||
|
ls -alh sgl-kernel/dist || true
|
||||||
|
rm -rf sgl-kernel/dist/* || true
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run benchmark tests
|
||||||
|
timeout-minutes: 45
|
||||||
|
run: |
|
||||||
|
cd sgl-kernel/benchmark
|
||||||
|
echo "Running sgl-kernel benchmark tests in CI mode..."
|
||||||
|
|
||||||
|
echo "CI environment variable: $CI"
|
||||||
|
echo "GITHUB_ACTIONS environment variable: $GITHUB_ACTIONS"
|
||||||
|
|
||||||
|
for bench_file in bench_*.py; do
|
||||||
|
echo "Testing $bench_file..."
|
||||||
|
timeout 60 python3 "$bench_file" || echo "Warning: $bench_file timed out or failed, continuing..."
|
||||||
|
echo "Completed $bench_file"
|
||||||
|
echo "---"
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "All benchmark tests completed!"
|
||||||
|
|
||||||
|
sgl-kernel-b200-test:
|
||||||
|
runs-on: ${{ inputs.b200_runner }}
|
||||||
|
timeout-minutes: 240
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || inputs.git_ref || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-stage-health
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Cleanup
|
||||||
|
run: |
|
||||||
|
ls -alh sgl-kernel/dist || true
|
||||||
|
rm -rf sgl-kernel/dist/* || true
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-python3.10-cuda12.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh diffusion
|
||||||
|
|
||||||
|
- name: Run sgl-kernel unit tests on B200
|
||||||
|
timeout-minutes: 30
|
||||||
|
run: |
|
||||||
|
cd sgl-kernel
|
||||||
|
pytest tests/
|
||||||
|
|
||||||
|
# Adding a single CUDA13 smoke test to verify that the kernel builds and runs
|
||||||
|
# TODO: Add back this test when it can pass on CI
|
||||||
|
# cuda13-kernel-smoke-test:
|
||||||
|
# if: inputs.sgl_kernel == 'true'
|
||||||
|
# runs-on: x64-cu13-kernel-tests
|
||||||
|
# steps:
|
||||||
|
# - uses: actions/checkout@v4
|
||||||
|
|
||||||
|
# - name: Cleanup
|
||||||
|
# run: |
|
||||||
|
# ls -alh sgl-kernel/dist || true
|
||||||
|
# rm -rf sgl-kernel/dist/* || true
|
||||||
|
|
||||||
|
# - name: Download CUDA 13.0 artifacts
|
||||||
|
# uses: actions/download-artifact@v4
|
||||||
|
# with:
|
||||||
|
# path: sgl-kernel/dist/
|
||||||
|
# merge-multiple: true
|
||||||
|
# pattern: wheel-python3.10-cuda13.0
|
||||||
|
|
||||||
|
# - name: Install dependencies
|
||||||
|
# run: |
|
||||||
|
# CUSTOM_BUILD_SGL_KERNEL=${{inputs.sgl_kernel}} bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
# - name: Run kernel unit tests
|
||||||
|
# timeout-minutes: 30
|
||||||
|
# run: |
|
||||||
|
# cd sgl-kernel
|
||||||
|
# pytest tests/
|
||||||
131
third_party/sglang/.github/workflows/pr-test-xeon.yml
vendored
Normal file
131
third_party/sglang/.github/workflows/pr-test-xeon.yml
vendored
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
name: PR Test (Xeon)
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
workflow_dispatch:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref (branch, tag, or SHA) to test. If not provided, uses the default branch.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
run_all_tests:
|
||||||
|
description: "Run all tests (for releasing or testing purpose)"
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: false
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: pr-test-xeon-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: false
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
# ==================== Check Changes ==================== #
|
||||||
|
check-changes:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
main_package: ${{ steps.filter.outputs.main_package || steps.run-mode.outputs.run_all_tests}}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Determine run mode
|
||||||
|
id: run-mode
|
||||||
|
run: |
|
||||||
|
# Run all tests for workflow_call (when ref input is provided)
|
||||||
|
# Note: github.event_name is inherited from caller, so we detect workflow_call by checking inputs.ref
|
||||||
|
if [[ "${{ inputs.run_all_tests }}" == "true" ]]; then
|
||||||
|
echo "run_all_tests=true" >> $GITHUB_OUTPUT
|
||||||
|
echo "Run mode: ALL TESTS (run_all_tests=${{ inputs.run_all_tests }})"
|
||||||
|
else
|
||||||
|
echo "run_all_tests=false" >> $GITHUB_OUTPUT
|
||||||
|
echo "Run mode: FILTERED (triggered by ${{ github.event_name }})"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Detect file changes
|
||||||
|
id: filter
|
||||||
|
uses: dorny/paths-filter@v3
|
||||||
|
if: steps.run-mode.outputs.run_all_tests != 'true'
|
||||||
|
with:
|
||||||
|
filters: |
|
||||||
|
main_package:
|
||||||
|
- "python/sglang/!(multimodal_gen)/**/!(*.md)"
|
||||||
|
- "python/pyproject_cpu.toml"
|
||||||
|
- "test/**/!(*.md)"
|
||||||
|
- "sgl-kernel/**/*.!(md|txt)"
|
||||||
|
- ".github/workflows/pr-test-xeon.yml"
|
||||||
|
- "docker/xeon.Dockerfile"
|
||||||
|
|
||||||
|
# ==================== PR Gate ==================== #
|
||||||
|
pr-gate:
|
||||||
|
needs: check-changes
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
uses: ./.github/workflows/pr-gate.yml
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
build-test:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
runs-on: xeon-gnr
|
||||||
|
env:
|
||||||
|
HF_HOME: /home/sdp/.cache/huggingface
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
build_type: ['all']
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Build and Push
|
||||||
|
run: |
|
||||||
|
version=$(cat python/sglang/version.py | cut -d'"' -f2)
|
||||||
|
tag=v${version}-xeon
|
||||||
|
PR_REPO=${{ github.event.pull_request.head.repo.clone_url }}
|
||||||
|
PR_HEAD_REF=${{ github.head_ref }}
|
||||||
|
|
||||||
|
docker build \
|
||||||
|
${PR_REPO:+--build-arg SGLANG_REPO=$PR_REPO} \
|
||||||
|
${PR_HEAD_REF:+--build-arg VER_SGLANG=$PR_HEAD_REF} \
|
||||||
|
. -f docker/xeon.Dockerfile -t sglang_xeon --no-cache
|
||||||
|
|
||||||
|
- name: Run container
|
||||||
|
run: |
|
||||||
|
docker run -dt \
|
||||||
|
-v ${{ github.workspace }}:/sglang-checkout/ --ipc=host \
|
||||||
|
-v ${HF_HOME}:/root/.cache/huggingface \
|
||||||
|
--name ci_sglang_xeon \
|
||||||
|
sglang_xeon
|
||||||
|
|
||||||
|
- name: Check AMX support
|
||||||
|
id: check_amx
|
||||||
|
timeout-minutes: 5
|
||||||
|
run: |
|
||||||
|
docker exec -w /sglang-checkout/ ci_sglang_xeon \
|
||||||
|
bash -c "source /opt/.venv/bin/activate && python3 -c 'import torch; import sgl_kernel; assert torch._C._cpu._is_amx_tile_supported(); assert hasattr(torch.ops.sgl_kernel, \"convert_weight_packed\"); '"
|
||||||
|
|
||||||
|
- name: Run unit tests
|
||||||
|
timeout-minutes: 36
|
||||||
|
run: |
|
||||||
|
docker exec -w /sglang-checkout/ ci_sglang_xeon \
|
||||||
|
bash -c "source /opt/.venv/bin/activate && cd ./test/srt && python3 run_suite.py --suite per-commit-cpu --timeout-per-file 1500"
|
||||||
|
|
||||||
|
- name: Change permission
|
||||||
|
timeout-minutes: 2
|
||||||
|
run: |
|
||||||
|
docker exec -u root ci_sglang_xeon bash -c "
|
||||||
|
rm -rf /tmp/ci-home &&
|
||||||
|
chown -R $(id -u):$(id -g) /sglang-checkout/ 2>/dev/null || true
|
||||||
|
"
|
||||||
|
|
||||||
|
- name: Cleanup container
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
docker rm -f ci_sglang_xeon || true
|
||||||
143
third_party/sglang/.github/workflows/pr-test-xpu.yml
vendored
Normal file
143
third_party/sglang/.github/workflows/pr-test-xpu.yml
vendored
Normal file
@@ -0,0 +1,143 @@
|
|||||||
|
name: PR Test (XPU)
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
workflow_dispatch:
|
||||||
|
workflow_call:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: 'Git ref (branch, tag, or SHA) to test. If not provided, uses the default branch.'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
run_all_tests:
|
||||||
|
description: "Run all tests (for releasing or testing purpose)"
|
||||||
|
required: false
|
||||||
|
type: boolean
|
||||||
|
default: false
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: pr-test-xpu-${{ inputs.ref || github.ref }}
|
||||||
|
cancel-in-progress: ${{ github.event_name != 'workflow_call' }}
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
# ==================== Check Changes ==================== #
|
||||||
|
check-changes:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
main_package: ${{ steps.filter.outputs.main_package || steps.run-mode.outputs.run_all_tests }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Determine run mode
|
||||||
|
id: run-mode
|
||||||
|
run: |
|
||||||
|
# Run all tests for workflow_call (when ref input is provided)
|
||||||
|
# Note: github.event_name is inherited from caller, so we detect workflow_call by checking inputs.ref
|
||||||
|
if [[ "${{ inputs.run_all_tests }}" == "true" ]]; then
|
||||||
|
echo "run_all_tests=true" >> $GITHUB_OUTPUT
|
||||||
|
echo "Run mode: ALL TESTS (run_all_tests=${{ inputs.run_all_tests }})"
|
||||||
|
else
|
||||||
|
echo "run_all_tests=false" >> $GITHUB_OUTPUT
|
||||||
|
echo "Run mode: FILTERED (triggered by ${{ github.event_name }})"
|
||||||
|
fi
|
||||||
|
- name: Detect file changes
|
||||||
|
id: filter
|
||||||
|
uses: dorny/paths-filter@v3
|
||||||
|
if: steps.run-mode.outputs.run_all_tests != 'true'
|
||||||
|
with:
|
||||||
|
filters: |
|
||||||
|
main_package:
|
||||||
|
- "python/sglang/!(multimodal_gen)/**/!(*.md)"
|
||||||
|
- "python/pyproject_xpu.toml"
|
||||||
|
- "test/**/!(*.md)"
|
||||||
|
- "sgl-kernel/**/*.!(md|txt)"
|
||||||
|
- ".github/workflows/pr-test-xpu.yml"
|
||||||
|
- "docker/xpu.Dockerfile"
|
||||||
|
|
||||||
|
# ==================== PR Gate ==================== #
|
||||||
|
pr-gate:
|
||||||
|
needs: check-changes
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
uses: ./.github/workflows/pr-gate.yml
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
build-and-test:
|
||||||
|
needs: [check-changes, pr-gate]
|
||||||
|
if: needs.check-changes.outputs.main_package == 'true'
|
||||||
|
runs-on: intel-bmg
|
||||||
|
env:
|
||||||
|
HF_HOME: /home/sdp/.cache/huggingface
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0
|
||||||
|
ref: ${{ inputs.ref || github.ref }}
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Build Docker image
|
||||||
|
run: |
|
||||||
|
PR_REPO=${{ github.event.pull_request.head.repo.clone_url }}
|
||||||
|
PR_HEAD_REF=${{ github.head_ref }}
|
||||||
|
docker build \
|
||||||
|
${PR_REPO:+--build-arg SG_LANG_REPO=$PR_REPO} \
|
||||||
|
${PR_HEAD_REF:+--build-arg SG_LANG_BRANCH=$PR_HEAD_REF} \
|
||||||
|
--no-cache --progress=plain -f docker/xpu.Dockerfile -t xpu_sglang_main:bmg .
|
||||||
|
|
||||||
|
- name: Run container
|
||||||
|
id: start_container
|
||||||
|
run: |
|
||||||
|
container_id=$(docker run -dt \
|
||||||
|
--group-add 992 \
|
||||||
|
--group-add $(getent group video | cut -d: -f3) \
|
||||||
|
-v ${HF_HOME}:/root/.cache/huggingface \
|
||||||
|
--device /dev/dri \
|
||||||
|
-e HF_TOKEN="$(cat ~/huggingface_token.txt)" \
|
||||||
|
xpu_sglang_main:bmg)
|
||||||
|
echo "Started container: $container_id"
|
||||||
|
echo "container_id=$container_id" >> "$GITHUB_OUTPUT"
|
||||||
|
|
||||||
|
- name: Install Dependency
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
cid="${{ steps.start_container.outputs.container_id }}"
|
||||||
|
docker exec "$cid" /home/sdp/miniforge3/envs/py3.10/bin/python3 -m pip install --upgrade pip
|
||||||
|
docker exec "$cid" /home/sdp/miniforge3/envs/py3.10/bin/python3 -m pip install pytest expecttest ray huggingface_hub
|
||||||
|
docker exec "$cid" /home/sdp/miniforge3/envs/py3.10/bin/python3 -m pip uninstall -y flashinfer-python
|
||||||
|
docker exec "$cid" /bin/bash -c '/home/sdp/miniforge3/envs/py3.10/bin/hf auth login --token ${HF_TOKEN} '
|
||||||
|
|
||||||
|
|
||||||
|
- name: Run E2E Bfloat16 tests
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
cid="${{ steps.start_container.outputs.container_id }}"
|
||||||
|
docker exec "$cid" bash -c "source /home/sdp/miniforge3/bin/activate && conda activate py3.10 && cd /home/sdp/sglang/test/srt && python3 run_suite.py --suite per-commit-xpu"
|
||||||
|
- name: Cleanup container
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
cid="${{ steps.start_container.outputs.container_id }}"
|
||||||
|
docker rm -f "$cid" || true
|
||||||
|
|
||||||
|
finish:
|
||||||
|
if: always()
|
||||||
|
needs: [build-and-test, pr-gate]
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Check job status
|
||||||
|
run: |
|
||||||
|
result="${{ needs.build-and-test.result }}"
|
||||||
|
if [ "$result" != "success" ] && [ "$result" != "skipped" ]; then
|
||||||
|
echo "Job failed with result: $result"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "All jobs completed successfully (result: $result)"
|
||||||
|
exit 0
|
||||||
1378
third_party/sglang/.github/workflows/pr-test.yml
vendored
Normal file
1378
third_party/sglang/.github/workflows/pr-test.yml
vendored
Normal file
File diff suppressed because it is too large
Load Diff
215
third_party/sglang/.github/workflows/release-branch-cut.yml
vendored
Normal file
215
third_party/sglang/.github/workflows/release-branch-cut.yml
vendored
Normal file
@@ -0,0 +1,215 @@
|
|||||||
|
name: Release Branch Cut
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
branch_name:
|
||||||
|
description: 'Branch name to create (e.g., release/v0.5.7)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
commit_sha:
|
||||||
|
description: 'Commit SHA from main to cut the release branch from (defaults to latest main)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ''
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
actions: write
|
||||||
|
contents: write
|
||||||
|
issues: read
|
||||||
|
pull-requests: read
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
cut-release-branch:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: 'prod'
|
||||||
|
outputs:
|
||||||
|
branch_name: ${{ steps.set_output.outputs.branch_name }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: main
|
||||||
|
fetch-depth: 0
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Validate branch name
|
||||||
|
run: |
|
||||||
|
BRANCH_NAME="${{ github.event.inputs.branch_name }}"
|
||||||
|
|
||||||
|
if [ -z "$BRANCH_NAME" ]; then
|
||||||
|
echo "::error::Branch name is required"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate branch name format (should start with release/)
|
||||||
|
if [[ ! "$BRANCH_NAME" =~ ^release/ ]]; then
|
||||||
|
echo "::warning::Branch name '$BRANCH_NAME' does not follow convention 'release/vX.Y.Z'"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Branch name: $BRANCH_NAME"
|
||||||
|
|
||||||
|
- name: Validate commit SHA
|
||||||
|
id: validate
|
||||||
|
run: |
|
||||||
|
COMMIT_SHA="${{ github.event.inputs.commit_sha }}"
|
||||||
|
|
||||||
|
# If no commit SHA provided, use latest main
|
||||||
|
if [ -z "$COMMIT_SHA" ]; then
|
||||||
|
COMMIT_SHA=$(git rev-parse HEAD)
|
||||||
|
echo "No commit SHA provided, using latest main: $COMMIT_SHA"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Verify the commit exists and is on main
|
||||||
|
if ! git cat-file -t "$COMMIT_SHA" > /dev/null 2>&1; then
|
||||||
|
echo "::error::Commit SHA '$COMMIT_SHA' does not exist"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if commit is an ancestor of main (i.e., is on main branch)
|
||||||
|
if ! git merge-base --is-ancestor "$COMMIT_SHA" main; then
|
||||||
|
echo "::error::Commit SHA '$COMMIT_SHA' is not on the main branch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "COMMIT_SHA=$COMMIT_SHA" >> $GITHUB_OUTPUT
|
||||||
|
echo "Validated commit SHA: $COMMIT_SHA"
|
||||||
|
|
||||||
|
- name: Check if branch already exists
|
||||||
|
run: |
|
||||||
|
BRANCH_NAME="${{ github.event.inputs.branch_name }}"
|
||||||
|
|
||||||
|
if git ls-remote --heads origin "$BRANCH_NAME" | grep -q "$BRANCH_NAME"; then
|
||||||
|
echo "::error::Branch '$BRANCH_NAME' already exists"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Branch '$BRANCH_NAME' does not exist, proceeding with creation"
|
||||||
|
|
||||||
|
- name: Create release branch
|
||||||
|
id: set_output
|
||||||
|
run: |
|
||||||
|
COMMIT_SHA="${{ steps.validate.outputs.COMMIT_SHA }}"
|
||||||
|
BRANCH_NAME="${{ github.event.inputs.branch_name }}"
|
||||||
|
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
|
||||||
|
# Create branch from the specified commit
|
||||||
|
git checkout -b "$BRANCH_NAME" "$COMMIT_SHA"
|
||||||
|
|
||||||
|
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT
|
||||||
|
echo "Successfully created branch '$BRANCH_NAME' from commit '$COMMIT_SHA'"
|
||||||
|
|
||||||
|
- name: Update version references in documentation
|
||||||
|
run: |
|
||||||
|
BRANCH_NAME="${{ github.event.inputs.branch_name }}"
|
||||||
|
# Extract version from branch name (e.g., release/v0.5.8 -> v0.5.8)
|
||||||
|
VERSION=$(echo "$BRANCH_NAME" | sed 's/release\///')
|
||||||
|
|
||||||
|
# Update git clone version references in docs
|
||||||
|
sed -i "s/git clone -b v[0-9]\+\.[0-9]\+\.[0-9]\+\.\?post\?[0-9]*/git clone -b $VERSION/" docs/get_started/install.md
|
||||||
|
sed -i "s/git clone -b v[0-9]\+\.[0-9]\+\.[0-9]\+\.\?post\?[0-9]*/git clone -b $VERSION/" docs/platforms/amd_gpu.md
|
||||||
|
|
||||||
|
# Check if any changes were made
|
||||||
|
if git diff --quiet; then
|
||||||
|
echo "No version references needed updating"
|
||||||
|
else
|
||||||
|
git add docs/get_started/install.md docs/platforms/amd_gpu.md
|
||||||
|
git commit -m "docs: update version references to $VERSION"
|
||||||
|
echo "Updated version references to $VERSION"
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Push release branch
|
||||||
|
run: |
|
||||||
|
BRANCH_NAME="${{ steps.set_output.outputs.branch_name }}"
|
||||||
|
git push origin "$BRANCH_NAME"
|
||||||
|
echo "Successfully pushed branch '$BRANCH_NAME'"
|
||||||
|
|
||||||
|
- name: Summary
|
||||||
|
run: |
|
||||||
|
COMMIT_SHA="${{ steps.validate.outputs.COMMIT_SHA }}"
|
||||||
|
BRANCH_NAME="${{ github.event.inputs.branch_name }}"
|
||||||
|
|
||||||
|
echo "## Release Branch Cut Summary" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Property | Value |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "|----------|-------|" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Branch | \`$BRANCH_NAME\` |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Commit | \`$COMMIT_SHA\` |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "| Triggered by | @${{ github.actor }} |" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "### Next Steps" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "1. Tests are automatically triggered on the release branch" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "2. Apply any hotfixes if needed" >> $GITHUB_STEP_SUMMARY
|
||||||
|
echo "3. Create a tag to trigger release: \`gh workflow run release-tag.yml -f version=X.Y.Z -f ref=$BRANCH_NAME\`" >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
run-pr-tests-nvidia:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/pr-test.yml
|
||||||
|
with:
|
||||||
|
git_ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
run_all_tests: true
|
||||||
|
skip_stage_health_check: true
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-pr-tests-amd:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/pr-test-amd.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
run_all_tests: true
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-pr-test-npu:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/pr-test-npu.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
run_all_tests: true
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-pr-tests-xeon:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/pr-test-xeon.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
run_all_tests: true
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-pr-tests-xpu:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/pr-test-xpu.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
run_all_tests: true
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-nightly-tests-nvidia:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/nightly-test-nvidia.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-nightly-tests-amd:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/nightly-test-amd.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-nightly-tests-npu:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/nightly-test-npu.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
|
|
||||||
|
run-nightly-tests-intel:
|
||||||
|
needs: cut-release-branch
|
||||||
|
uses: ./.github/workflows/nightly-test-intel.yml
|
||||||
|
with:
|
||||||
|
ref: ${{ needs.cut-release-branch.outputs.branch_name }}
|
||||||
|
secrets: inherit
|
||||||
182
third_party/sglang/.github/workflows/release-docker-amd-nightly.yml
vendored
Normal file
182
third_party/sglang/.github/workflows/release-docker-amd-nightly.yml
vendored
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
name: Release Docker Images Nightly (AMD)
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 12 * * *'
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
# A PR number if a pull request and otherwise the commit hash. This cancels
|
||||||
|
# queued and in-progress runs for the same PR (presubmit) or commit
|
||||||
|
# (postsubmit). The workflow name is prepended to avoid conflicts between
|
||||||
|
# different workflows.
|
||||||
|
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: amd-docker-scale
|
||||||
|
environment: 'prod'
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
gpu_arch: ['gfx942', 'gfx950']
|
||||||
|
build_type: ['all']
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0 # Required for git describe to find tags
|
||||||
|
|
||||||
|
- name: "Set Date"
|
||||||
|
run: |
|
||||||
|
echo "DATE=$(date +%Y%m%d)" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Get version from latest tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
# Get the latest version tag sorted by version number (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION=$(git tag -l 'v[0-9]*' --sort=-v:refname | head -1 | sed 's/^v//')
|
||||||
|
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Could not determine version from git tags"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get short commit hash of current HEAD
|
||||||
|
COMMIT_HASH=$(git rev-parse --short HEAD)
|
||||||
|
|
||||||
|
# Compose pretend version for setuptools_scm: e.g., 0.5.8.dev20260129+g1a2b3c4
|
||||||
|
PRETEND_VERSION="${VERSION}.dev${{ env.DATE }}+g${COMMIT_HASH}"
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "pretend_version=${PRETEND_VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "Detected version: ${VERSION}"
|
||||||
|
echo "Pretend version for pip: ${PRETEND_VERSION}"
|
||||||
|
|
||||||
|
- name: Login to Docker Hub (AMD)
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_AMD_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_AMD_TOKEN }}
|
||||||
|
|
||||||
|
- name: Build and Push to rocm/sgl-dev
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
pretend_version=${{ steps.version.outputs.pretend_version }}
|
||||||
|
echo "Version: ${version}"
|
||||||
|
echo "Pretend version: ${pretend_version}"
|
||||||
|
|
||||||
|
if [ "${{ matrix.gpu_arch }}" = "gfx942" ]; then
|
||||||
|
rocm_tag="rocm700-mi30x"
|
||||||
|
elif [ "${{ matrix.gpu_arch }}" = "gfx950" ]; then
|
||||||
|
rocm_tag="rocm700-mi35x"
|
||||||
|
else
|
||||||
|
echo "Unsupported gfx arch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
tag=v${version}-${rocm_tag}
|
||||||
|
echo "IMAGE_TAG=${tag}-${{ env.DATE }}" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
docker build . -f docker/rocm.Dockerfile --build-arg SGL_BRANCH=${{ github.ref_name }} --build-arg BUILD_TYPE=${{ matrix.build_type }} --build-arg GPU_ARCH=${{ matrix.gpu_arch }} --build-arg ENABLE_MORI=1 --build-arg NIC_BACKEND=ainic --build-arg SETUPTOOLS_SCM_PRETEND_VERSION=${pretend_version} -t rocm/sgl-dev:${tag}-${{ env.DATE }} --no-cache
|
||||||
|
docker push rocm/sgl-dev:${tag}-${{ env.DATE }}
|
||||||
|
|
||||||
|
- name: Login to Docker Hub (lmsys)
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Push to lmsysorg/sglang-rocm
|
||||||
|
run: |
|
||||||
|
docker tag rocm/sgl-dev:${{ env.IMAGE_TAG }} lmsysorg/sglang-rocm:${{ env.IMAGE_TAG }}
|
||||||
|
docker push lmsysorg/sglang-rocm:${{ env.IMAGE_TAG }}
|
||||||
|
|
||||||
|
# Temporarily disable docker cache seeding until performant storage is in place
|
||||||
|
cache:
|
||||||
|
if: false
|
||||||
|
# if: always() && github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: linux-mi300-gpu-1
|
||||||
|
environment: 'prod'
|
||||||
|
needs: publish
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
gpu_arch: ['gfx942']
|
||||||
|
build_type: ['all']
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0 # Required for git describe to find tags
|
||||||
|
|
||||||
|
- name: "Set Date"
|
||||||
|
run: |
|
||||||
|
echo "DATE=$(date +%Y%m%d)" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Get version from latest tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
# Get the latest version tag sorted by version number (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION=$(git tag -l 'v[0-9]*' --sort=-v:refname | head -1 | sed 's/^v//')
|
||||||
|
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Could not determine version from git tags"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "Detected version: ${VERSION}"
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_AMD_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_AMD_TOKEN }}
|
||||||
|
|
||||||
|
- name: Pull and Save Docker Image to Cache
|
||||||
|
run: |
|
||||||
|
set -euxo pipefail
|
||||||
|
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
echo "Version: ${version}"
|
||||||
|
|
||||||
|
if [ "${{ matrix.gpu_arch }}" = "gfx942" ]; then
|
||||||
|
rocm_tag="rocm700-mi30x"
|
||||||
|
else
|
||||||
|
echo "Unsupported gfx arch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
tag=v${version}-${rocm_tag}
|
||||||
|
|
||||||
|
if [ "${{ matrix.build_type }}" = "all" ]; then
|
||||||
|
tag_suffix=""
|
||||||
|
else
|
||||||
|
echo "Unsupported build type"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
image="rocm/sgl-dev:${tag}-${{ env.DATE }}${tag_suffix}"
|
||||||
|
|
||||||
|
# Determine target cache file name based on ROCm variant
|
||||||
|
if [[ "${rocm_tag}" == rocm700* ]]; then
|
||||||
|
final_path="/home/runner/sgl-data/docker/image-700.tar"
|
||||||
|
else
|
||||||
|
echo "Unexpected ROCm tag: ${rocm_tag}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
tmp_path="${final_path}.tmp"
|
||||||
|
|
||||||
|
echo "Pulling image: ${image}"
|
||||||
|
docker pull "${image}"
|
||||||
|
|
||||||
|
echo "Saving to temp file: ${tmp_path}"
|
||||||
|
docker save "${image}" -o "${tmp_path}"
|
||||||
|
|
||||||
|
echo "Moving to final path: ${final_path}"
|
||||||
|
mv -f "${tmp_path}" "${final_path}"
|
||||||
|
|
||||||
|
echo "Cache populated successfully at ${final_path}"
|
||||||
94
third_party/sglang/.github/workflows/release-docker-amd-rocm720-nightly.yml
vendored
Normal file
94
third_party/sglang/.github/workflows/release-docker-amd-rocm720-nightly.yml
vendored
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
name: Release Docker Images ROCm 7.2.0 Nightly Preview (AMD)
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 12 * * *'
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
# A PR number if a pull request and otherwise the commit hash. This cancels
|
||||||
|
# queued and in-progress runs for the same PR (presubmit) or commit
|
||||||
|
# (postsubmit). The workflow name is prepended to avoid conflicts between
|
||||||
|
# different workflows.
|
||||||
|
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
|
||||||
|
cancel-in-progress: True
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: amd-docker-scale
|
||||||
|
environment: 'prod'
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
gpu_arch: ['gfx942-rocm720', 'gfx950-rocm720']
|
||||||
|
build_type: ['all']
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0 # Required for git describe to find tags
|
||||||
|
|
||||||
|
- name: "Set Date"
|
||||||
|
run: |
|
||||||
|
echo "DATE=$(date +%Y%m%d)" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
- name: Get version from latest tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
# Get the latest version tag sorted by version number (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION=$(git tag -l 'v[0-9]*' --sort=-v:refname | head -1 | sed 's/^v//')
|
||||||
|
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Could not determine version from git tags"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get short commit hash of current HEAD
|
||||||
|
COMMIT_HASH=$(git rev-parse --short HEAD)
|
||||||
|
|
||||||
|
# Compose pretend version for setuptools_scm: e.g., 0.5.8.post1.dev20260211+g1a2b3c4
|
||||||
|
PRETEND_VERSION="${VERSION}.dev${{ env.DATE }}+g${COMMIT_HASH}"
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "pretend_version=${PRETEND_VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "Detected version: ${VERSION}"
|
||||||
|
echo "Pretend version for pip: ${PRETEND_VERSION}"
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_AMD_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_AMD_TOKEN }}
|
||||||
|
|
||||||
|
- name: Build and Push to rocm/sgl-dev
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
pretend_version=${{ steps.version.outputs.pretend_version }}
|
||||||
|
echo "Version: ${version}"
|
||||||
|
echo "Pretend version: ${pretend_version}"
|
||||||
|
|
||||||
|
if [ "${{ matrix.gpu_arch }}" = "gfx942-rocm720" ]; then
|
||||||
|
rocm_tag="rocm720-mi30x"
|
||||||
|
elif [ "${{ matrix.gpu_arch }}" = "gfx950-rocm720" ]; then
|
||||||
|
rocm_tag="rocm720-mi35x"
|
||||||
|
else
|
||||||
|
echo "Unsupported gfx arch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
tag=v${version}-${rocm_tag}
|
||||||
|
echo "IMAGE_TAG=${tag}-${{ env.DATE }}" >> $GITHUB_ENV
|
||||||
|
|
||||||
|
docker build . -f docker/rocm.Dockerfile --build-arg SGL_BRANCH=${{ github.ref_name }} --build-arg BUILD_TYPE=${{ matrix.build_type }} --build-arg GPU_ARCH=${{ matrix.gpu_arch }} --build-arg ENABLE_MORI=1 --build-arg NIC_BACKEND=ainic --build-arg SETUPTOOLS_SCM_PRETEND_VERSION=${pretend_version} -t rocm/sgl-dev:${tag}-${{ env.DATE }} --no-cache
|
||||||
|
docker push rocm/sgl-dev:${tag}-${{ env.DATE }}
|
||||||
|
|
||||||
|
- name: Login to Docker Hub (lmsys)
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Push to lmsysorg/sglang-rocm
|
||||||
|
run: |
|
||||||
|
docker tag rocm/sgl-dev:${{ env.IMAGE_TAG }} lmsysorg/sglang-rocm:${{ env.IMAGE_TAG }}
|
||||||
|
docker push lmsysorg/sglang-rocm:${{ env.IMAGE_TAG }}
|
||||||
88
third_party/sglang/.github/workflows/release-docker-amd.yml
vendored
Normal file
88
third_party/sglang/.github/workflows/release-docker-amd.yml
vendored
Normal file
@@ -0,0 +1,88 @@
|
|||||||
|
name: Release Docker Images (AMD)
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- 'v[0-9]+.*'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: 'Version to build (without v prefix, e.g., 0.5.7)'
|
||||||
|
required: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: amd-docker-scale
|
||||||
|
environment: 'prod'
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
rocm_version: ['rocm700', 'rocm720']
|
||||||
|
gpu_arch: ['gfx942', 'gfx950']
|
||||||
|
build_type: ['all']
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
echo "Version: ${version}"
|
||||||
|
|
||||||
|
gpu_arch_suffix=""
|
||||||
|
if [ "${{ matrix.rocm_version }}" = "rocm700" ]; then
|
||||||
|
if [ "${{ matrix.gpu_arch }}" = "gfx942" ]; then
|
||||||
|
rocm_tag="rocm700-mi30x"
|
||||||
|
elif [ "${{ matrix.gpu_arch }}" = "gfx950" ]; then
|
||||||
|
rocm_tag="rocm700-mi35x"
|
||||||
|
else
|
||||||
|
echo "Unsupported gfx arch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
elif [ "${{ matrix.rocm_version }}" = "rocm720" ]; then
|
||||||
|
gpu_arch_suffix="-${{ matrix.rocm_version }}"
|
||||||
|
if [ "${{ matrix.gpu_arch }}" = "gfx942" ]; then
|
||||||
|
rocm_tag="rocm720-mi30x"
|
||||||
|
elif [ "${{ matrix.gpu_arch }}" = "gfx950" ]; then
|
||||||
|
rocm_tag="rocm720-mi35x"
|
||||||
|
else
|
||||||
|
echo "Unsupported gfx arch"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "Unsupported rocm version"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
tag=v${version}-${rocm_tag}
|
||||||
|
|
||||||
|
# rocm.Dockerfile expects SGL_BRANCH with 'v' prefix for git tag checkout
|
||||||
|
docker build . -f docker/rocm.Dockerfile --build-arg BUILD_TYPE=${{ matrix.build_type }} --build-arg GPU_ARCH=${{ matrix.gpu_arch }}${gpu_arch_suffix} --build-arg SGL_BRANCH=v${version} --build-arg ENABLE_MORI=1 --build-arg NIC_BACKEND=ainic -t lmsysorg/sglang:${tag} --no-cache
|
||||||
|
docker push lmsysorg/sglang:${tag}
|
||||||
190
third_party/sglang/.github/workflows/release-docker-cu13-framework.yml
vendored
Normal file
190
third_party/sglang/.github/workflows/release-docker-cu13-framework.yml
vendored
Normal file
@@ -0,0 +1,190 @@
|
|||||||
|
name: Release CUDA 13 Framework Docker Images (Temporary)
|
||||||
|
|
||||||
|
# Temporary workflow to build only versioned cu13 framework images
|
||||||
|
# Can be deleted after use
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: "Version to build (without v prefix, e.g., 0.5.8)"
|
||||||
|
required: true
|
||||||
|
jobs:
|
||||||
|
publish-x86:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: x64-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Free disk space
|
||||||
|
uses: jlumbroso/free-disk-space@main
|
||||||
|
with:
|
||||||
|
tool-cache: false
|
||||||
|
docker-images: false
|
||||||
|
android: true
|
||||||
|
dotnet: true
|
||||||
|
haskell: true
|
||||||
|
large-packages: true
|
||||||
|
swap-storage: false
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Validate version
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push AMD64 Framework (CUDA 13)
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target framework \
|
||||||
|
--platform linux/amd64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=13.0.1 \
|
||||||
|
--build-arg BUILD_TYPE=all \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg GRACE_BLACKWELL=0 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest-cu130-amd64-framework.txt
|
||||||
|
|
||||||
|
- name: Upload digest
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-cu130-amd64
|
||||||
|
path: /tmp/digest-cu130-amd64-framework.txt
|
||||||
|
retention-days: 1
|
||||||
|
|
||||||
|
publish-arm64:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: arm-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Validate version
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push ARM64 Framework (CUDA 13)
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target framework \
|
||||||
|
--platform linux/arm64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=13.0.1 \
|
||||||
|
--build-arg BUILD_TYPE=all \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg GRACE_BLACKWELL=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest-cu130-arm64-framework.txt
|
||||||
|
|
||||||
|
- name: Upload digest
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-cu130-arm64
|
||||||
|
path: /tmp/digest-cu130-arm64-framework.txt
|
||||||
|
retention-days: 1
|
||||||
|
|
||||||
|
create-manifest:
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
needs: [publish-x86, publish-arm64]
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
steps:
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Download amd64 digest
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-cu130-amd64
|
||||||
|
path: /tmp/digests/amd64
|
||||||
|
|
||||||
|
- name: Download arm64 digest
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-cu130-arm64
|
||||||
|
path: /tmp/digests/arm64
|
||||||
|
|
||||||
|
- name: Create multi-arch manifest
|
||||||
|
run: |
|
||||||
|
version=${{ github.event.inputs.version }}
|
||||||
|
AMD64_DIGEST=$(cat /tmp/digests/amd64/digest-cu130-amd64-framework.txt)
|
||||||
|
ARM64_DIGEST=$(cat /tmp/digests/arm64/digest-cu130-arm64-framework.txt)
|
||||||
|
|
||||||
|
# Create versioned CUDA 13 framework manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:v${version}-cu130 \
|
||||||
|
lmsysorg/sglang@${AMD64_DIGEST} \
|
||||||
|
lmsysorg/sglang@${ARM64_DIGEST}
|
||||||
|
|
||||||
|
# Create latest CUDA 13 framework manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:latest-cu130 \
|
||||||
|
lmsysorg/sglang@${AMD64_DIGEST} \
|
||||||
|
lmsysorg/sglang@${ARM64_DIGEST}
|
||||||
209
third_party/sglang/.github/workflows/release-docker-dev.yml
vendored
Normal file
209
third_party/sglang/.github/workflows/release-docker-dev.yml
vendored
Normal file
@@ -0,0 +1,209 @@
|
|||||||
|
name: Build and Push Development Docker Images
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
pr_number:
|
||||||
|
description: "PR number to build from (leave empty to use current branch)"
|
||||||
|
required: false
|
||||||
|
default: ""
|
||||||
|
tag:
|
||||||
|
description: "Custom tag suffix (overrides pr_number in tag). E.g. 'my-test' → dev-my-test, dev-cu13-my-test, etc."
|
||||||
|
required: false
|
||||||
|
default: ""
|
||||||
|
schedule:
|
||||||
|
- cron: "0 0 * * *"
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: release-docker-dev-${{ inputs.tag || inputs.pr_number || 'nightly' }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-dev:
|
||||||
|
if: ${{ github.repository == 'sgl-project/sglang' }}
|
||||||
|
runs-on: ${{ matrix.runner }}
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
include:
|
||||||
|
- runner: x64-docker-build-node
|
||||||
|
platform: linux/amd64
|
||||||
|
build_type: all
|
||||||
|
grace_blackwell: 0
|
||||||
|
arch_tag: x86
|
||||||
|
version: 12.9.1
|
||||||
|
- runner: arm-docker-build-node
|
||||||
|
platform: linux/arm64
|
||||||
|
build_type: all
|
||||||
|
grace_blackwell: 1
|
||||||
|
arch_tag: arm64
|
||||||
|
version: 12.9.1
|
||||||
|
- runner: x64-docker-build-node
|
||||||
|
platform: linux/amd64
|
||||||
|
build_type: all
|
||||||
|
grace_blackwell: 0
|
||||||
|
arch_tag: x86-cu13
|
||||||
|
version: 13.0.1
|
||||||
|
- runner: arm-docker-build-node
|
||||||
|
platform: linux/arm64
|
||||||
|
build_type: all
|
||||||
|
grace_blackwell: 1
|
||||||
|
arch_tag: arm64-cu13
|
||||||
|
version: 13.0.1
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || github.ref }}
|
||||||
|
|
||||||
|
- name: Free disk space
|
||||||
|
uses: jlumbroso/free-disk-space@main
|
||||||
|
with:
|
||||||
|
tool-cache: true
|
||||||
|
docker-images: true
|
||||||
|
android: true
|
||||||
|
dotnet: true
|
||||||
|
haskell: true
|
||||||
|
large-packages: true
|
||||||
|
swap-storage: true
|
||||||
|
|
||||||
|
- name: Prune Docker to reclaim disk space
|
||||||
|
run: |
|
||||||
|
docker buildx prune --filter "until=72h" -f
|
||||||
|
docker system prune -af --filter "until=72h"
|
||||||
|
docker volume prune -af
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Build and Push Dev Image
|
||||||
|
run: |
|
||||||
|
# Nightly (schedule) installs latest release; manual dispatch builds from checked-out source
|
||||||
|
if [ "${{ github.event_name }}" = "schedule" ]; then
|
||||||
|
SOURCE_ARG="--build-arg USE_LATEST_SGLANG=1"
|
||||||
|
else
|
||||||
|
SOURCE_ARG="--build-arg BRANCH_TYPE=local"
|
||||||
|
fi
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--platform ${{ matrix.platform }} \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
--target framework \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=${{ matrix.version }} \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.build_type }} \
|
||||||
|
--build-arg CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
|
||||||
|
--build-arg GRACE_BLACKWELL=${{ matrix.grace_blackwell }} \
|
||||||
|
${SOURCE_ARG} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--metadata-file /tmp/metadata.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest.txt
|
||||||
|
|
||||||
|
- name: Upload digest
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-${{ matrix.arch_tag }}
|
||||||
|
path: /tmp/digest.txt
|
||||||
|
retention-days: 1
|
||||||
|
|
||||||
|
create-manifests:
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
needs: [build-dev]
|
||||||
|
if: ${{ github.repository == 'sgl-project/sglang' }}
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
variant:
|
||||||
|
- base: dev
|
||||||
|
x86: x86
|
||||||
|
arm64: arm64
|
||||||
|
- base: dev-cu13
|
||||||
|
x86: x86-cu13
|
||||||
|
arm64: arm64-cu13
|
||||||
|
steps:
|
||||||
|
- uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Download x86 digest
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-${{ matrix.variant.x86 }}
|
||||||
|
path: /tmp/digests/x86
|
||||||
|
|
||||||
|
- name: Download arm64 digest
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digest-${{ matrix.variant.arm64 }}
|
||||||
|
path: /tmp/digests/arm64
|
||||||
|
|
||||||
|
- name: Create multi-arch manifest
|
||||||
|
run: |
|
||||||
|
X86_DIGEST=$(cat /tmp/digests/x86/digest.txt)
|
||||||
|
ARM64_DIGEST=$(cat /tmp/digests/arm64/digest.txt)
|
||||||
|
|
||||||
|
SUFFIX=""
|
||||||
|
if [ -n "${{ inputs.tag }}" ]; then
|
||||||
|
SUFFIX="-${{ inputs.tag }}"
|
||||||
|
elif [ -n "${{ inputs.pr_number }}" ]; then
|
||||||
|
SUFFIX="-pr-${{ inputs.pr_number }}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
TAG="${{ matrix.variant.base }}${SUFFIX}"
|
||||||
|
|
||||||
|
# For nightly (no suffix), also stamp a dated tag
|
||||||
|
EXTRA_TAG=""
|
||||||
|
if [ -z "${SUFFIX}" ]; then
|
||||||
|
SHORT_SHA="${{ github.sha }}"
|
||||||
|
EXTRA_TAG="-t lmsysorg/sglang:nightly-${TAG}-$(date +%Y%m%d)-${SHORT_SHA:0:8}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:${TAG} \
|
||||||
|
${EXTRA_TAG} \
|
||||||
|
lmsysorg/sglang@${X86_DIGEST} \
|
||||||
|
lmsysorg/sglang@${ARM64_DIGEST}
|
||||||
|
|
||||||
|
echo "✓ Published lmsysorg/sglang:${TAG}"
|
||||||
|
|
||||||
|
- name: Cleanup Old Nightly Builds
|
||||||
|
if: ${{ !inputs.tag && !inputs.pr_number }}
|
||||||
|
run: |
|
||||||
|
TOKEN=$(curl -s -H "Content-Type: application/json" \
|
||||||
|
-X POST -d '{"username": "${{ secrets.DOCKERHUB_USERNAME }}", "password": "${{ secrets.DOCKERHUB_TOKEN }}"}' \
|
||||||
|
https://hub.docker.com/v2/users/login/ | jq -r .token)
|
||||||
|
|
||||||
|
TAGS_RESPONSE=$(curl -s -H "Authorization: JWT $TOKEN" \
|
||||||
|
"https://hub.docker.com/v2/repositories/lmsysorg/sglang/tags/?page_size=100")
|
||||||
|
|
||||||
|
TAGS=$(echo "$TAGS_RESPONSE" | jq -r \
|
||||||
|
'.results[] | select(.name | test("^nightly-${{ matrix.variant.base }}-[0-9]")) | "\(.last_updated)|\(.name)"' \
|
||||||
|
| sort -r | cut -d'|' -f2)
|
||||||
|
|
||||||
|
TAG_COUNT=$(echo "$TAGS" | wc -l)
|
||||||
|
if [ "$TAG_COUNT" -gt 14 ]; then
|
||||||
|
echo "Found $TAG_COUNT nightly builds, keeping only the 14 most recent"
|
||||||
|
TAGS_TO_DELETE=$(echo "$TAGS" | tail -n +15)
|
||||||
|
for tag in $TAGS_TO_DELETE; do
|
||||||
|
echo "Deleting tag: $tag"
|
||||||
|
curl -X DELETE -H "Authorization: JWT $TOKEN" \
|
||||||
|
"https://hub.docker.com/v2/repositories/lmsysorg/sglang/tags/$tag/"
|
||||||
|
done
|
||||||
|
else
|
||||||
|
echo "Only $TAG_COUNT nightly builds found, no cleanup needed"
|
||||||
|
fi
|
||||||
39
third_party/sglang/.github/workflows/release-docker-gateway.yml
vendored
Normal file
39
third_party/sglang/.github/workflows/release-docker-gateway.yml
vendored
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
name: Release SGLang Model Gateway Docker Image
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- sgl-model-gateway/bindings/python/pyproject.toml
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-24.04
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up QEMU
|
||||||
|
uses: docker/setup-qemu-action@v3
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v3
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Build and Push
|
||||||
|
run: |
|
||||||
|
version=$(cat sgl-model-gateway/bindings/python/src/sglang_router/version.py | cut -d'"' -f2)
|
||||||
|
tag=v${version}
|
||||||
|
|
||||||
|
docker buildx build . -f docker/gateway.Dockerfile \
|
||||||
|
--platform linux/amd64,linux/arm64 \
|
||||||
|
-t lmsysorg/sgl-model-gateway:${tag} \
|
||||||
|
-t lmsysorg/sgl-model-gateway:latest \
|
||||||
|
--push
|
||||||
85
third_party/sglang/.github/workflows/release-docker-npu-nightly.yml
vendored
Normal file
85
third_party/sglang/.github/workflows/release-docker-npu-nightly.yml
vendored
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
name: Release Docker Images Nightly (NPU)
|
||||||
|
on:
|
||||||
|
pull_request:
|
||||||
|
branches:
|
||||||
|
- 'main'
|
||||||
|
paths:
|
||||||
|
- '.github/workflows/release-docker-npu-nightly.yml'
|
||||||
|
- 'docker/npu.Dockerfile'
|
||||||
|
workflow_dispatch:
|
||||||
|
schedule:
|
||||||
|
- cron: "0 16 * * *" # Execute at 0:00 a.m. Beijing Time every day
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: ${{ github.workflow }}-${{ github.sha }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
runs-on: ubuntu-22.04-arm
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
cann_version: ["8.5.0"]
|
||||||
|
device_type: ["910b", "a3"]
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Free up disk space
|
||||||
|
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
|
||||||
|
with:
|
||||||
|
tool-cache: true
|
||||||
|
docker-images: false
|
||||||
|
|
||||||
|
- name: Setup Docker buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Docker meta
|
||||||
|
id: meta
|
||||||
|
uses: docker/metadata-action@v5
|
||||||
|
with:
|
||||||
|
images: |
|
||||||
|
lmsysorg/sglang
|
||||||
|
# push with schedule event
|
||||||
|
# push with workflow_dispatch event
|
||||||
|
tags: |
|
||||||
|
type=ref,event=pr
|
||||||
|
type=ref,event=branch
|
||||||
|
type=schedule,pattern=main
|
||||||
|
flavor: |
|
||||||
|
latest=false
|
||||||
|
suffix=-cann${{ matrix.cann_version }}-${{ matrix.device_type }},onlatest=true
|
||||||
|
# Login against a Docker registry except on PR
|
||||||
|
# https://github.com/docker/login-action
|
||||||
|
- name: Log into docker hub
|
||||||
|
uses: docker/login-action@v3
|
||||||
|
if: ${{ github.repository == 'sgl-project/sglang' && github.event_name != 'pull_request' }}
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
# Enable Docker multi-architecture build environment
|
||||||
|
# Emulate non-native architectures
|
||||||
|
- name: Set up QEMU
|
||||||
|
uses: docker/setup-qemu-action@v3
|
||||||
|
# Required for building and pushing multi-arch Docker images
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
# Build and push Docker image with Buildx (don't push on PR)
|
||||||
|
# https://github.com/docker/build-push-action
|
||||||
|
- name: Build and push Docker image
|
||||||
|
id: build-and-push
|
||||||
|
uses: docker/build-push-action@v6
|
||||||
|
with:
|
||||||
|
context: docker
|
||||||
|
file: docker/npu.Dockerfile
|
||||||
|
platforms: linux/arm64,linux/amd64
|
||||||
|
labels: ${{ steps.meta.outputs.labels }}
|
||||||
|
tags: ${{ steps.meta.outputs.tags }}
|
||||||
|
push: ${{ github.repository == 'sgl-project/sglang' && github.event_name != 'pull_request' }}
|
||||||
|
provenance: false
|
||||||
|
build-args: |
|
||||||
|
SGLANG_KERNEL_NPU_TAG=2026.03.10.rc1
|
||||||
|
CANN_VERSION=${{ matrix.cann_version }}
|
||||||
|
DEVICE_TYPE=${{ matrix.device_type }}
|
||||||
93
third_party/sglang/.github/workflows/release-docker-npu.yml
vendored
Normal file
93
third_party/sglang/.github/workflows/release-docker-npu.yml
vendored
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
name: Release Docker Images (NPU)
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- 'v[0-9]+.*'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: 'Version to build (without v prefix, e.g., 0.5.7)'
|
||||||
|
required: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
runs-on: ubuntu-22.04-arm
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
cann_version: ["8.5.0"]
|
||||||
|
device_type: ["910b", "a3"]
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Free up disk space
|
||||||
|
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
|
||||||
|
with:
|
||||||
|
tool-cache: true
|
||||||
|
docker-images: false
|
||||||
|
|
||||||
|
# push with tag
|
||||||
|
- name: Docker meta
|
||||||
|
id: meta
|
||||||
|
uses: docker/metadata-action@v5
|
||||||
|
with:
|
||||||
|
images: |
|
||||||
|
lmsysorg/sglang
|
||||||
|
tags: |
|
||||||
|
type=ref,event=pr
|
||||||
|
flavor: |
|
||||||
|
latest=false
|
||||||
|
|
||||||
|
# Login against a Docker registry except on PR
|
||||||
|
# https://github.com/docker/login-action
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
if: ${{ github.repository == 'sgl-project/sglang' && github.event_name != 'pull_request' }}
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "version=v${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "TAG=lmsysorg/sglang:v${VERSION}-cann${{ matrix.cann_version }}-${{ matrix.device_type }}" >> $GITHUB_OUTPUT
|
||||||
|
# Enable Docker multi-architecture build environment
|
||||||
|
# Emulate non-native architectures
|
||||||
|
- name: Set up QEMU
|
||||||
|
uses: docker/setup-qemu-action@v3
|
||||||
|
# Required for building and pushing multi-arch Docker images
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Build and push Docker image
|
||||||
|
id: build-and-push
|
||||||
|
uses: docker/build-push-action@v6
|
||||||
|
with:
|
||||||
|
context: docker
|
||||||
|
file: docker/npu.Dockerfile
|
||||||
|
platforms: linux/arm64,linux/amd64
|
||||||
|
labels: ${{ steps.meta.outputs.labels }}
|
||||||
|
tags: ${{ steps.meta.outputs.tags || steps.version.outputs.TAG }}
|
||||||
|
push: ${{ github.repository == 'sgl-project/sglang' && github.event_name != 'pull_request' }}
|
||||||
|
provenance: false
|
||||||
|
build-args: |
|
||||||
|
SGLANG_KERNEL_NPU_TAG=2026.03.10.rc1
|
||||||
|
CANN_VERSION=${{ matrix.cann_version }}
|
||||||
|
DEVICE_TYPE=${{ matrix.device_type }}
|
||||||
|
SGLANG_TAG=${{ steps.version.outputs.version }}
|
||||||
309
third_party/sglang/.github/workflows/release-docker-runtime.yml
vendored
Normal file
309
third_party/sglang/.github/workflows/release-docker-runtime.yml
vendored
Normal file
@@ -0,0 +1,309 @@
|
|||||||
|
name: Release Docker Runtime Images
|
||||||
|
#
|
||||||
|
# This workflow builds and publishes runtime Docker images (production-optimized, ~50% smaller):
|
||||||
|
# - lmsysorg/sglang:v{version}-runtime, lmsysorg/sglang:latest-runtime
|
||||||
|
# - lmsysorg/sglang:v{version}-cu130-runtime, lmsysorg/sglang:latest-cu130-runtime
|
||||||
|
#
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- "v[0-9]+.*"
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: "Version to build (without v prefix, e.g., 0.5.7)"
|
||||||
|
required: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish-x86:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
environment: "prod"
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
variant:
|
||||||
|
- cuda_version: "12.9.1"
|
||||||
|
build_type: "all"
|
||||||
|
grace_blackwell: 0
|
||||||
|
runs-on: x64-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Free disk space
|
||||||
|
uses: jlumbroso/free-disk-space@main
|
||||||
|
with:
|
||||||
|
tool-cache: false
|
||||||
|
docker-images: false
|
||||||
|
android: true
|
||||||
|
dotnet: true
|
||||||
|
haskell: true
|
||||||
|
large-packages: true
|
||||||
|
swap-storage: false
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push AMD64 Runtime
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target runtime \
|
||||||
|
--platform linux/amd64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=${{ matrix.variant.cuda_version }} \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg GRACE_BLACKWELL=${{ matrix.variant.grace_blackwell }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu129-runtime.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu129-runtime.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest-cu129-amd64-runtime.txt
|
||||||
|
|
||||||
|
- name: Build and Push AMD64 Runtime (CUDA 13)
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target runtime \
|
||||||
|
--platform linux/amd64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=13.0.1 \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg GRACE_BLACKWELL=0 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu130-runtime.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu130-runtime.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest-cu130-amd64-runtime.txt
|
||||||
|
|
||||||
|
- name: Upload digests
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digests-amd64
|
||||||
|
path: /tmp/digest-*.txt
|
||||||
|
retention-days: 1
|
||||||
|
|
||||||
|
publish-arm64:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
environment: "prod"
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
variant:
|
||||||
|
- cuda_version: "12.9.1"
|
||||||
|
build_type: "all"
|
||||||
|
grace_blackwell: 1
|
||||||
|
runs-on: arm-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push ARM64 Runtime
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target runtime \
|
||||||
|
--platform linux/arm64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=${{ matrix.variant.cuda_version }} \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg GRACE_BLACKWELL=${{ matrix.variant.grace_blackwell }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu129-runtime.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu129-runtime.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest-cu129-arm64-runtime.txt
|
||||||
|
|
||||||
|
- name: Build and Push ARM64 Runtime (CUDA 13)
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target runtime \
|
||||||
|
--platform linux/arm64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=13.0.1 \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg GRACE_BLACKWELL=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu130-runtime.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu130-runtime.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "${DIGEST}" > /tmp/digest-cu130-arm64-runtime.txt
|
||||||
|
|
||||||
|
- name: Upload digests
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digests-arm64
|
||||||
|
path: /tmp/digest-*.txt
|
||||||
|
retention-days: 1
|
||||||
|
|
||||||
|
create-manifests:
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
needs: [publish-x86, publish-arm64]
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
environment: "prod"
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Download amd64 digests
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digests-amd64
|
||||||
|
path: /tmp/digests/amd64
|
||||||
|
|
||||||
|
- name: Download arm64 digests
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: digests-arm64
|
||||||
|
path: /tmp/digests/arm64
|
||||||
|
|
||||||
|
- name: Create multi-arch manifests
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
CU129_AMD64_RT=$(cat /tmp/digests/amd64/digest-cu129-amd64-runtime.txt)
|
||||||
|
CU130_AMD64_RT=$(cat /tmp/digests/amd64/digest-cu130-amd64-runtime.txt)
|
||||||
|
CU129_ARM64_RT=$(cat /tmp/digests/arm64/digest-cu129-arm64-runtime.txt)
|
||||||
|
CU130_ARM64_RT=$(cat /tmp/digests/arm64/digest-cu130-arm64-runtime.txt)
|
||||||
|
|
||||||
|
# Create versioned runtime manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:v${version}-runtime \
|
||||||
|
lmsysorg/sglang@${CU129_AMD64_RT} \
|
||||||
|
lmsysorg/sglang@${CU129_ARM64_RT}
|
||||||
|
|
||||||
|
# Create latest runtime manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:latest-runtime \
|
||||||
|
lmsysorg/sglang@${CU129_AMD64_RT} \
|
||||||
|
lmsysorg/sglang@${CU129_ARM64_RT}
|
||||||
|
|
||||||
|
# Create versioned CUDA 13 runtime manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:v${version}-cu130-runtime \
|
||||||
|
lmsysorg/sglang@${CU130_AMD64_RT} \
|
||||||
|
lmsysorg/sglang@${CU130_ARM64_RT}
|
||||||
|
|
||||||
|
# Create latest CUDA 13 runtime manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:latest-cu130-runtime \
|
||||||
|
lmsysorg/sglang@${CU130_AMD64_RT} \
|
||||||
|
lmsysorg/sglang@${CU130_ARM64_RT}
|
||||||
62
third_party/sglang/.github/workflows/release-docker-xeon.yml
vendored
Normal file
62
third_party/sglang/.github/workflows/release-docker-xeon.yml
vendored
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
name: Release Docker Xeon Images
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- 'v[0-9]+.*'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: 'Version to build (without v prefix, e.g., 0.5.7)'
|
||||||
|
required: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-24.04
|
||||||
|
environment: 'prod'
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
build_type: ['all']
|
||||||
|
steps:
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
tag=v${version}-xeon
|
||||||
|
|
||||||
|
docker build . -f docker/xeon.Dockerfile \
|
||||||
|
--build-arg VER_SGLANG=v${version} \
|
||||||
|
-t lmsysorg/sglang:${tag} \
|
||||||
|
--no-cache
|
||||||
|
docker push lmsysorg/sglang:${tag}
|
||||||
294
third_party/sglang/.github/workflows/release-docker.yml
vendored
Normal file
294
third_party/sglang/.github/workflows/release-docker.yml
vendored
Normal file
@@ -0,0 +1,294 @@
|
|||||||
|
name: Release Docker Images
|
||||||
|
#
|
||||||
|
# This workflow builds and publishes framework Docker images (full development environment):
|
||||||
|
# - lmsysorg/sglang:v{version}, lmsysorg/sglang:latest
|
||||||
|
# - lmsysorg/sglang:v{version}-cu130, lmsysorg/sglang:latest-cu130
|
||||||
|
#
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- "v[0-9]+.*"
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: "Version to build (without v prefix, e.g., 0.5.7)"
|
||||||
|
required: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish-x86:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
environment: "prod"
|
||||||
|
outputs:
|
||||||
|
digest-cu129: ${{ steps.build-cu129.outputs.digest }}
|
||||||
|
digest-cu130: ${{ steps.build-cu130.outputs.digest }}
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
variant:
|
||||||
|
- cuda_version: "12.9.1"
|
||||||
|
build_type: "all"
|
||||||
|
grace_blackwell: 0
|
||||||
|
runs-on: x64-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Free disk space
|
||||||
|
uses: jlumbroso/free-disk-space@main
|
||||||
|
with:
|
||||||
|
tool-cache: false
|
||||||
|
docker-images: false
|
||||||
|
android: true
|
||||||
|
dotnet: true
|
||||||
|
haskell: true
|
||||||
|
large-packages: true
|
||||||
|
swap-storage: false
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build AMD64 Framework
|
||||||
|
id: build-cu129
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target framework \
|
||||||
|
--platform linux/amd64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=${{ matrix.variant.cuda_version }} \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg GRACE_BLACKWELL=${{ matrix.variant.grace_blackwell }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu129-framework.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu129-framework.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "digest=${DIGEST}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push AMD64 Framework (CUDA 13)
|
||||||
|
id: build-cu130
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target framework \
|
||||||
|
--platform linux/amd64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=13.0.1 \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg GRACE_BLACKWELL=0 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu130-framework.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu130-framework.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "digest=${DIGEST}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
publish-arm64:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
environment: "prod"
|
||||||
|
outputs:
|
||||||
|
digest-cu129: ${{ steps.build-cu129.outputs.digest }}
|
||||||
|
digest-cu130: ${{ steps.build-cu130.outputs.digest }}
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
variant:
|
||||||
|
- cuda_version: "12.9.1"
|
||||||
|
build_type: "all"
|
||||||
|
grace_blackwell: 1
|
||||||
|
runs-on: arm-docker-build-node
|
||||||
|
steps:
|
||||||
|
- name: Delete huge unnecessary tools folder
|
||||||
|
run: rm -rf /opt/hostedtoolcache
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build ARM64 Framework
|
||||||
|
id: build-cu129
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target framework \
|
||||||
|
--platform linux/arm64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=${{ matrix.variant.cuda_version }} \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg GRACE_BLACKWELL=${{ matrix.variant.grace_blackwell }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu129-framework.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu129-framework.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "digest=${DIGEST}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Build and Push ARM64 Framework (CUDA 13)
|
||||||
|
id: build-cu130
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
docker buildx build \
|
||||||
|
--target framework \
|
||||||
|
--platform linux/arm64 \
|
||||||
|
--output type=image,name=lmsysorg/sglang,push-by-digest=true,name-canonical=true,push=true \
|
||||||
|
-f docker/Dockerfile \
|
||||||
|
--build-arg CUDA_VERSION=13.0.1 \
|
||||||
|
--build-arg BUILD_TYPE=${{ matrix.variant.build_type }} \
|
||||||
|
--build-arg INSTALL_FLASHINFER_JIT_CACHE=1 \
|
||||||
|
--build-arg GRACE_BLACKWELL=1 \
|
||||||
|
--build-arg SGL_VERSION=${version} \
|
||||||
|
--metadata-file /tmp/metadata-cu130-framework.json \
|
||||||
|
--no-cache \
|
||||||
|
.
|
||||||
|
|
||||||
|
DIGEST=$(python3 -c "import json; print(json.load(open('/tmp/metadata-cu130-framework.json'))['containerimage.digest'])")
|
||||||
|
echo "Pushed digest: ${DIGEST}"
|
||||||
|
echo "digest=${DIGEST}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
create-manifests:
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
needs: [publish-x86, publish-arm64]
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
environment: "prod"
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Docker Buildx
|
||||||
|
uses: docker/setup-buildx-action@v3
|
||||||
|
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Get version from tag
|
||||||
|
id: version
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
else
|
||||||
|
# Extract version from tag (e.g., v0.5.7 -> 0.5.7)
|
||||||
|
VERSION="${GITHUB_REF_NAME#v}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Validate version format
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is empty"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "version=${VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Create multi-arch manifests
|
||||||
|
run: |
|
||||||
|
version=${{ steps.version.outputs.version }}
|
||||||
|
|
||||||
|
CU129_AMD64_FW=${{ needs.publish-x86.outputs.digest-cu129 }}
|
||||||
|
CU130_AMD64_FW=${{ needs.publish-x86.outputs.digest-cu130 }}
|
||||||
|
CU129_ARM64_FW=${{ needs.publish-arm64.outputs.digest-cu129 }}
|
||||||
|
CU130_ARM64_FW=${{ needs.publish-arm64.outputs.digest-cu130 }}
|
||||||
|
|
||||||
|
# Create versioned framework manifest (default)
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:v${version} \
|
||||||
|
lmsysorg/sglang@${CU129_AMD64_FW} \
|
||||||
|
lmsysorg/sglang@${CU129_ARM64_FW}
|
||||||
|
|
||||||
|
# Create latest framework manifest (default)
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:latest \
|
||||||
|
lmsysorg/sglang@${CU129_AMD64_FW} \
|
||||||
|
lmsysorg/sglang@${CU129_ARM64_FW}
|
||||||
|
|
||||||
|
# Create versioned CUDA 13 framework manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:v${version}-cu130 \
|
||||||
|
lmsysorg/sglang@${CU130_AMD64_FW} \
|
||||||
|
lmsysorg/sglang@${CU130_ARM64_FW}
|
||||||
|
|
||||||
|
# Create latest CUDA 13 framework manifest
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:latest-cu130 \
|
||||||
|
lmsysorg/sglang@${CU130_AMD64_FW} \
|
||||||
|
lmsysorg/sglang@${CU130_ARM64_FW}
|
||||||
89
third_party/sglang/.github/workflows/release-docs.yml
vendored
Normal file
89
third_party/sglang/.github/workflows/release-docs.yml
vendored
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
name: Release Documentation
|
||||||
|
|
||||||
|
on:
|
||||||
|
release:
|
||||||
|
types: [published]
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- "docs/**"
|
||||||
|
- "python/sglang/version.py"
|
||||||
|
- "python/sglang/**"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: release-docs-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
execute-and-deploy:
|
||||||
|
runs-on: 1-gpu-h100
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Fetch full git history for release index
|
||||||
|
if: github.event_name == 'release'
|
||||||
|
run: |
|
||||||
|
git fetch --prune --unshallow || git fetch --prune --depth=0
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
pip install -r docs/requirements.txt
|
||||||
|
apt-get update && apt-get install -y pandoc parallel retry
|
||||||
|
ln -sf "$(which python3)" /usr/bin/python
|
||||||
|
|
||||||
|
- name: Setup Jupyter Kernel
|
||||||
|
run: |
|
||||||
|
python -m ipykernel install --user --name python3 --display-name "Python 3"
|
||||||
|
|
||||||
|
- name: Execute notebooks
|
||||||
|
timeout-minutes: 40
|
||||||
|
run: |
|
||||||
|
cd docs
|
||||||
|
make clean
|
||||||
|
make compile
|
||||||
|
|
||||||
|
- name: Push HTML to sgl-project.github.io
|
||||||
|
timeout-minutes: 30
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_DOCUMENTATION }}
|
||||||
|
run: |
|
||||||
|
cd docs
|
||||||
|
make html
|
||||||
|
make markdown
|
||||||
|
python3 wrap_run_llm.py
|
||||||
|
|
||||||
|
if [[ "${{ github.event_name }}" == "release" ]]; then
|
||||||
|
python3 release_lookup/generate_index.py --output release_lookup/release_index.json
|
||||||
|
|
||||||
|
# Copy release lookup tool for official docs on published releases.
|
||||||
|
mkdir -p _build/html/release_lookup
|
||||||
|
cp release_lookup/index.html _build/html/release_lookup/
|
||||||
|
cp release_lookup/release_index.json _build/html/release_lookup/
|
||||||
|
fi
|
||||||
|
|
||||||
|
cd _build/html
|
||||||
|
|
||||||
|
git clone https://$GITHUB_TOKEN@github.com/sgl-project/sgl-project.github.io.git ../sgl-project.github.io --depth 1
|
||||||
|
if [[ "${{ github.event_name }}" == "release" ]]; then
|
||||||
|
find ../sgl-project.github.io/ -mindepth 1 -not -path "../sgl-project.github.io/.git*" -not -name CNAME -not -name ".jekyll" -not -name ".nojekyll" -delete
|
||||||
|
else
|
||||||
|
find ../sgl-project.github.io/ -mindepth 1 -not -path "../sgl-project.github.io/.git*" -not -path "../sgl-project.github.io/release_lookup*" -not -name CNAME -not -name ".jekyll" -not -name ".nojekyll" -delete
|
||||||
|
fi
|
||||||
|
cp -r * ../sgl-project.github.io
|
||||||
|
cp ../../README.md ../sgl-project.github.io/README.md
|
||||||
|
cd ../sgl-project.github.io
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglangbot@gmail.com"
|
||||||
|
git add .
|
||||||
|
git commit -m "Update $(date +'%Y-%m-%d %H:%M:%S')"
|
||||||
|
git push https://$GITHUB_TOKEN@github.com/sgl-project/sgl-project.github.io.git main
|
||||||
|
cd ..
|
||||||
|
rm -rf sgl-project.github.io
|
||||||
167
third_party/sglang/.github/workflows/release-pypi-gateway.yml
vendored
Normal file
167
third_party/sglang/.github/workflows/release-pypi-gateway.yml
vendored
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
name: Release SGLang Model Gateway to PyPI
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- sgl-model-gateway/bindings/python/pyproject.toml
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
name: build on ${{ matrix.platform || matrix.os }} (${{ matrix.target }} - ${{ matrix.manylinux || 'auto' }})
|
||||||
|
runs-on: ${{ matrix.os }}-latest
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
os: [ubuntu, macos, windows]
|
||||||
|
target: [x86_64, aarch64]
|
||||||
|
manylinux: [auto]
|
||||||
|
include:
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
- os: windows
|
||||||
|
ls: dir
|
||||||
|
target: x86_64
|
||||||
|
python-architecture: x64
|
||||||
|
interpreter: 3.9 3.10 3.11 3.12 3.13
|
||||||
|
- os: macos
|
||||||
|
target: aarch64
|
||||||
|
interpreter: 3.9 3.10 3.11 3.12 3.13
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
target: aarch64
|
||||||
|
# musllinux
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
target: x86_64
|
||||||
|
manylinux: musllinux_1_1
|
||||||
|
- os: ubuntu
|
||||||
|
platform: linux
|
||||||
|
target: aarch64
|
||||||
|
manylinux: musllinux_1_1
|
||||||
|
exclude:
|
||||||
|
- os: windows
|
||||||
|
target: aarch64
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
path: sglang-repo
|
||||||
|
|
||||||
|
- name: Move sgl-model-gateway folder to root and delete sglang-repo
|
||||||
|
run: |
|
||||||
|
mv sglang-repo/sgl-model-gateway/* .
|
||||||
|
rm -rf sglang-repo
|
||||||
|
ls -alt
|
||||||
|
shell: bash
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.13"
|
||||||
|
architecture: ${{ matrix.python-architecture || 'x64' }}
|
||||||
|
|
||||||
|
- name: Install twine
|
||||||
|
run: pip install -U twine
|
||||||
|
|
||||||
|
- name: Install protoc (macOS)
|
||||||
|
if: matrix.os == 'macos'
|
||||||
|
run: brew install protobuf
|
||||||
|
|
||||||
|
- name: Install protoc (Windows)
|
||||||
|
if: matrix.os == 'windows'
|
||||||
|
run: choco install protoc -y
|
||||||
|
|
||||||
|
- name: Build wheels
|
||||||
|
uses: PyO3/maturin-action@v1
|
||||||
|
with:
|
||||||
|
working-directory: bindings/python
|
||||||
|
target: ${{ matrix.target }}
|
||||||
|
manylinux: ${{ matrix.manylinux || 'auto' }}
|
||||||
|
args: --release --out dist --features vendored-openssl --interpreter ${{ matrix.interpreter || '3.9 3.10 3.11 3.12 3.13 3.14' }}
|
||||||
|
rust-toolchain: stable
|
||||||
|
docker-options: -e CI -e CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc -e CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++
|
||||||
|
before-script-linux: |
|
||||||
|
# Install build dependencies (perl/make for vendored OpenSSL, protoc for gRPC)
|
||||||
|
if command -v yum &> /dev/null; then
|
||||||
|
yum update -y && yum install -y wget unzip gcc gcc-c++ perl-core make
|
||||||
|
# Install cross-compilation toolchain for aarch64 if needed
|
||||||
|
if [ "${{ matrix.target }}" = "aarch64" ]; then
|
||||||
|
yum install -y gcc-aarch64-linux-gnu gcc-c++-aarch64-linux-gnu || true
|
||||||
|
fi
|
||||||
|
elif command -v apt-get &> /dev/null; then
|
||||||
|
apt-get update && apt-get install -y wget unzip gcc g++ perl make
|
||||||
|
# Install cross-compilation toolchain for aarch64 if needed
|
||||||
|
if [ "${{ matrix.target }}" = "aarch64" ]; then
|
||||||
|
apt-get install -y gcc-aarch64-linux-gnu g++-aarch64-linux-gnu || true
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
(cd /tmp && \
|
||||||
|
wget https://github.com/protocolbuffers/protobuf/releases/download/v32.0/protoc-32.0-linux-x86_64.zip && \
|
||||||
|
unzip protoc-32.0-linux-x86_64.zip -d /usr/local && \
|
||||||
|
rm protoc-32.0-linux-x86_64.zip)
|
||||||
|
protoc --version
|
||||||
|
|
||||||
|
- name: List built packages
|
||||||
|
run: ${{ matrix.ls || 'ls -lh' }} bindings/python/dist/
|
||||||
|
|
||||||
|
- name: Check packages
|
||||||
|
run: twine check --strict bindings/python/dist/*
|
||||||
|
|
||||||
|
- uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: packages-${{ matrix.os }}-${{ matrix.target }}-${{ matrix.manylinux || 'auto' }}
|
||||||
|
path: bindings/python/dist/
|
||||||
|
|
||||||
|
build-sdist:
|
||||||
|
name: Build SDist
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
path: sglang-repo
|
||||||
|
|
||||||
|
- name: Move sgl-model-gateway folder to root and delete sglang-repo
|
||||||
|
run: |
|
||||||
|
mv sglang-repo/sgl-model-gateway/* .
|
||||||
|
rm -rf sglang-repo
|
||||||
|
ls -alt
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.13"
|
||||||
|
|
||||||
|
- name: Build SDist
|
||||||
|
uses: PyO3/maturin-action@v1
|
||||||
|
with:
|
||||||
|
working-directory: bindings/python
|
||||||
|
command: sdist
|
||||||
|
args: --out dist
|
||||||
|
rust-toolchain: stable
|
||||||
|
|
||||||
|
- uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: sdist
|
||||||
|
path: bindings/python/dist/*.tar.gz
|
||||||
|
|
||||||
|
upload:
|
||||||
|
name: Upload to PyPI
|
||||||
|
if: github.repository == 'sgl-project/sglang' # Ensure this job only runs for the sgl-project/sglang repository
|
||||||
|
needs: [build, build-sdist]
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: dist
|
||||||
|
merge-multiple: true
|
||||||
|
|
||||||
|
- name: Upload to PyPI
|
||||||
|
env:
|
||||||
|
TWINE_USERNAME: __token__
|
||||||
|
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN_ROUTER }}
|
||||||
|
run: |
|
||||||
|
pip install twine
|
||||||
|
twine upload dist/* --verbose
|
||||||
169
third_party/sglang/.github/workflows/release-pypi-nightly.yml
vendored
Normal file
169
third_party/sglang/.github/workflows/release-pypi-nightly.yml
vendored
Normal file
@@ -0,0 +1,169 @@
|
|||||||
|
name: Release PyPI Nightly Wheels
|
||||||
|
|
||||||
|
on:
|
||||||
|
# Run daily at 2 AM UTC
|
||||||
|
schedule:
|
||||||
|
- cron: '0 2 * * *'
|
||||||
|
# Triggered by nightly Docker workflow to use same commit
|
||||||
|
repository_dispatch:
|
||||||
|
types: [nightly-release]
|
||||||
|
# Manual trigger for testing
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
commit_sha:
|
||||||
|
description: 'Specific commit SHA to build (leave empty for latest)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
cuda_version:
|
||||||
|
description: 'CUDA version (e.g., 129 or 130)'
|
||||||
|
required: false
|
||||||
|
default: '129'
|
||||||
|
type: string
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: release-pypi-nightly-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-nightly-wheel:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
nightly_version: ${{ steps.build.outputs.nightly_version }}
|
||||||
|
commit_hash: ${{ steps.build.outputs.commit_hash }}
|
||||||
|
build_date: ${{ steps.build.outputs.build_date }}
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
# Use commit from: 1) Docker workflow, 2) manual input, 3) latest main
|
||||||
|
ref: ${{ github.event.client_payload.commit_sha || inputs.commit_sha || github.sha }}
|
||||||
|
fetch-depth: 0 # Need full history for setuptools-scm
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.10"
|
||||||
|
|
||||||
|
- name: Install build dependencies
|
||||||
|
run: |
|
||||||
|
pip install build wheel setuptools setuptools-scm
|
||||||
|
|
||||||
|
- name: Build wheel
|
||||||
|
id: build
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
cp ../README.md ../LICENSE .
|
||||||
|
|
||||||
|
# Parse git describe output to get latest tag
|
||||||
|
# Use same command as pyproject.toml to ensure version consistency
|
||||||
|
DESC=$(git tag --list --sort=-version:refname 'v*.*.*' | head -1 | xargs git describe --tags --long 2>/dev/null || echo 'v0.0.0-0-g0000000')
|
||||||
|
TAG=$(echo "$DESC" | cut -d- -f1)
|
||||||
|
HASH="g$(git rev-parse --short HEAD)"
|
||||||
|
BUILD_DATE=$(date -u +%Y%m%d)
|
||||||
|
|
||||||
|
# Increment patch version for nightlies (e.g., v0.5.9 -> 0.5.10)
|
||||||
|
# Must always increment so nightly > latest tag per PEP 440 ordering:
|
||||||
|
# X.Y.Z.devN < X.Y.Z.rcN < X.Y.Z < X.Y.(Z+1).devN
|
||||||
|
VERSION=${TAG#v} # Remove 'v' prefix
|
||||||
|
MAJOR=$(echo "$VERSION" | cut -d. -f1)
|
||||||
|
MINOR=$(echo "$VERSION" | cut -d. -f2)
|
||||||
|
PATCH_RAW=$(echo "$VERSION" | cut -d. -f3)
|
||||||
|
# Strip pre-release suffixes (rc0, post1, etc.) to get numeric patch
|
||||||
|
PATCH=$(echo "$PATCH_RAW" | sed 's/[^0-9].*//')
|
||||||
|
NEXT_PATCH=$((PATCH + 1))
|
||||||
|
NEXT_VERSION="${MAJOR}.${MINOR}.${NEXT_PATCH}"
|
||||||
|
|
||||||
|
# Use date-based dev number for correct chronological sorting
|
||||||
|
# e.g., 0.5.9.dev20260215+g4cf4f0859 > 0.5.9.dev20260214+g45a4697d4
|
||||||
|
FORCE_VERSION="${NEXT_VERSION}.dev${BUILD_DATE}+${HASH}"
|
||||||
|
echo "Forcing nightly version to: $FORCE_VERSION"
|
||||||
|
export SETUPTOOLS_SCM_PRETEND_VERSION="$FORCE_VERSION"
|
||||||
|
|
||||||
|
# Build wheel
|
||||||
|
python3 -m build --wheel
|
||||||
|
|
||||||
|
# Extract version from built wheel filename
|
||||||
|
WHEEL_FILE=$(ls dist/*.whl)
|
||||||
|
NIGHTLY_VERSION=$(echo "$WHEEL_FILE" | sed 's/.*sglang-\(.*\)-py3.*/\1/')
|
||||||
|
|
||||||
|
# Get commit info
|
||||||
|
COMMIT_HASH=$(git rev-parse --short HEAD)
|
||||||
|
BUILD_DATE=$(date -u +%Y-%m-%d)
|
||||||
|
|
||||||
|
echo "Built wheel: $WHEEL_FILE"
|
||||||
|
echo "Nightly version: ${NIGHTLY_VERSION}"
|
||||||
|
echo "Commit: ${COMMIT_HASH}"
|
||||||
|
echo "Build date: ${BUILD_DATE}"
|
||||||
|
|
||||||
|
echo "nightly_version=${NIGHTLY_VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "commit_hash=${COMMIT_HASH}" >> $GITHUB_OUTPUT
|
||||||
|
echo "build_date=${BUILD_DATE}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Upload wheel artifact
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: nightly-wheel
|
||||||
|
path: python/dist/*.whl
|
||||||
|
retention-days: 7
|
||||||
|
|
||||||
|
release-nightly:
|
||||||
|
needs: build-nightly-wheel
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: 'prod'
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Download wheel artifact
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: nightly-wheel
|
||||||
|
path: dist/
|
||||||
|
|
||||||
|
- name: List downloaded wheels
|
||||||
|
run: |
|
||||||
|
echo "Downloaded wheel:"
|
||||||
|
ls -lh dist/
|
||||||
|
|
||||||
|
- name: Create GitHub Release for nightly wheel
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: nightly-${{ needs.build-nightly-wheel.outputs.build_date }}-${{ needs.build-nightly-wheel.outputs.commit_hash }}
|
||||||
|
name: Nightly Build ${{ needs.build-nightly-wheel.outputs.build_date }} (${{ needs.build-nightly-wheel.outputs.commit_hash }})
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
prerelease: true
|
||||||
|
body: |
|
||||||
|
Nightly build from commit ${{ github.sha }}
|
||||||
|
Build date: ${{ needs.build-nightly-wheel.outputs.build_date }}
|
||||||
|
Version: ${{ needs.build-nightly-wheel.outputs.nightly_version }}
|
||||||
|
files: |
|
||||||
|
dist/*.whl
|
||||||
|
|
||||||
|
- name: Clone wheel index repository
|
||||||
|
run: |
|
||||||
|
git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.10"
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: |
|
||||||
|
python3 scripts/update_nightly_whl_index.py \
|
||||||
|
--commit-hash ${{ needs.build-nightly-wheel.outputs.commit_hash }} \
|
||||||
|
--nightly-version ${{ needs.build-nightly-wheel.outputs.nightly_version }} \
|
||||||
|
--cuda-version ${{ inputs.cuda_version || '129' }} \
|
||||||
|
--build-date ${{ needs.build-nightly-wheel.outputs.build_date }}
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git add -A
|
||||||
|
git diff --staged --quiet || git commit -m "Update nightly wheel index for commit ${{ needs.build-nightly-wheel.outputs.commit_hash }}"
|
||||||
|
git push
|
||||||
183
third_party/sglang/.github/workflows/release-pypi-pr.yml
vendored
Normal file
183
third_party/sglang/.github/workflows/release-pypi-pr.yml
vendored
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
name: Release PyPI PR Wheels
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
pr_number:
|
||||||
|
description: 'PR number to build wheel for (works with both internal and fork PRs)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: build-pr-wheel-${{ github.event.inputs.pr_number }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-pr-wheel:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
outputs:
|
||||||
|
wheel_version: ${{ steps.gen_version.outputs.wheel_version }}
|
||||||
|
commit_hash: ${{ steps.gen_version.outputs.commit_hash }}
|
||||||
|
build_date: ${{ steps.gen_version.outputs.build_date }}
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: refs/pull/${{ inputs.pr_number }}/head
|
||||||
|
fetch-depth: 0 # Need full history for version generation
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.10"
|
||||||
|
|
||||||
|
- name: Generate PR wheel version
|
||||||
|
id: gen_version
|
||||||
|
run: |
|
||||||
|
# Get base version from the latest v*.*.* git tag directly
|
||||||
|
# Note: We cannot use setuptools_scm here because the [tool.setuptools_scm]
|
||||||
|
# config (with custom git_describe_command) lives in python/pyproject.toml,
|
||||||
|
# not at the repo root. Without that config, setuptools_scm falls back to
|
||||||
|
# default git describe which finds gateway-* tags instead of v*.*.* release tags.
|
||||||
|
LATEST_TAG=$(git tag --list --sort=-version:refname 'v*.*.*' | head -1)
|
||||||
|
BASE_VERSION=${LATEST_TAG#v}
|
||||||
|
echo "Latest release tag: ${LATEST_TAG}"
|
||||||
|
|
||||||
|
# Get commit info
|
||||||
|
COMMIT_HASH=$(git rev-parse --short HEAD)
|
||||||
|
COMMIT_COUNT=$(git rev-list --count HEAD)
|
||||||
|
|
||||||
|
# Get current date in YYYY-MM-DD format
|
||||||
|
BUILD_DATE=$(date -u +%Y-%m-%d)
|
||||||
|
|
||||||
|
# Always use pr-{number} format for suffix
|
||||||
|
SUFFIX="pr-${{ inputs.pr_number }}"
|
||||||
|
|
||||||
|
# Generate PR wheel version following PEP 440
|
||||||
|
# Format: {base_version}.dev{commit_count}+pr-{number}.g{commit_hash}
|
||||||
|
WHEEL_VERSION="${BASE_VERSION}.dev${COMMIT_COUNT}+${SUFFIX}.g${COMMIT_HASH}"
|
||||||
|
|
||||||
|
echo "Base version: ${BASE_VERSION}"
|
||||||
|
echo "PR wheel version: ${WHEEL_VERSION}"
|
||||||
|
echo "Commit: ${COMMIT_HASH}"
|
||||||
|
echo "Build date: ${BUILD_DATE}"
|
||||||
|
|
||||||
|
echo "wheel_version=${WHEEL_VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "commit_hash=${COMMIT_HASH}" >> $GITHUB_OUTPUT
|
||||||
|
echo "base_version=${BASE_VERSION}" >> $GITHUB_OUTPUT
|
||||||
|
echo "build_date=${BUILD_DATE}" >> $GITHUB_OUTPUT
|
||||||
|
|
||||||
|
- name: Update pyproject.toml with PR wheel version
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
WHEEL_VERSION="${{ steps.gen_version.outputs.wheel_version }}"
|
||||||
|
|
||||||
|
# Update pyproject.toml to use static version instead of dynamic
|
||||||
|
# Remove 'version' from dynamic list and add static version
|
||||||
|
sed -i 's/dynamic = \["version"\]/dynamic = []/' pyproject.toml
|
||||||
|
sed -i "/^name = \"sglang\"/a version = \"${WHEEL_VERSION}\"" pyproject.toml
|
||||||
|
|
||||||
|
# Verify update
|
||||||
|
echo "Updated version in pyproject.toml:"
|
||||||
|
grep "^version" pyproject.toml
|
||||||
|
grep "^dynamic" pyproject.toml
|
||||||
|
|
||||||
|
- name: Install build dependencies
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
pip install build wheel setuptools
|
||||||
|
|
||||||
|
- name: Build wheel
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
cp ../README.md ../LICENSE .
|
||||||
|
python3 -m build --wheel
|
||||||
|
|
||||||
|
# List built wheels
|
||||||
|
echo "Built wheel:"
|
||||||
|
ls -lh dist/
|
||||||
|
|
||||||
|
- name: Upload wheel artifact
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: pr-wheel-${{ inputs.pr_number }}
|
||||||
|
path: python/dist/*.whl
|
||||||
|
retention-days: 30
|
||||||
|
|
||||||
|
release-pr-wheel:
|
||||||
|
needs: build-pr-wheel
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: 'prod'
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Download wheel artifact
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
name: pr-wheel-${{ inputs.pr_number }}
|
||||||
|
path: dist/
|
||||||
|
|
||||||
|
- name: List downloaded wheels
|
||||||
|
run: |
|
||||||
|
echo "Downloaded wheel:"
|
||||||
|
ls -lh dist/
|
||||||
|
|
||||||
|
- name: Create GitHub Release for PR wheel
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: pr-${{ inputs.pr_number }}-${{ needs.build-pr-wheel.outputs.build_date }}-${{ needs.build-pr-wheel.outputs.commit_hash }}
|
||||||
|
name: "PR #${{ inputs.pr_number }} Build (${{ needs.build-pr-wheel.outputs.commit_hash }})"
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
prerelease: true
|
||||||
|
body: |
|
||||||
|
PR wheel build from PR #${{ inputs.pr_number }}
|
||||||
|
Commit: ${{ github.sha }}
|
||||||
|
Build date: ${{ needs.build-pr-wheel.outputs.build_date }}
|
||||||
|
Version: ${{ needs.build-pr-wheel.outputs.wheel_version }}
|
||||||
|
|
||||||
|
**Installation via index (pip):**
|
||||||
|
```bash
|
||||||
|
pip install sglang==${{ needs.build-pr-wheel.outputs.wheel_version }} --index-url https://sgl-project.github.io/whl/pr/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Installation via index (uv):**
|
||||||
|
```bash
|
||||||
|
uv pip install sglang==${{ needs.build-pr-wheel.outputs.wheel_version }} --index-url https://sgl-project.github.io/whl/pr/ --extra-index-url https://pypi.org/simple --index-strategy unsafe-best-match
|
||||||
|
```
|
||||||
|
|
||||||
|
**Direct installation:**
|
||||||
|
```bash
|
||||||
|
pip install https://github.com/sgl-project/whl/releases/download/pr-${{ inputs.pr_number }}-${{ needs.build-pr-wheel.outputs.build_date }}-${{ needs.build-pr-wheel.outputs.commit_hash }}/sglang-${{ needs.build-pr-wheel.outputs.wheel_version }}-py3-none-any.whl
|
||||||
|
```
|
||||||
|
files: |
|
||||||
|
dist/*.whl
|
||||||
|
|
||||||
|
- name: Clone wheel index repository
|
||||||
|
run: |
|
||||||
|
git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.10"
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: |
|
||||||
|
python3 scripts/update_pr_whl_index.py \
|
||||||
|
--pr-number ${{ inputs.pr_number }} \
|
||||||
|
--commit-hash ${{ needs.build-pr-wheel.outputs.commit_hash }} \
|
||||||
|
--wheel-version ${{ needs.build-pr-wheel.outputs.wheel_version }} \
|
||||||
|
--build-date ${{ needs.build-pr-wheel.outputs.build_date }}
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git add -A
|
||||||
|
git diff --staged --quiet || git commit -m "Update PR wheel index for PR #${{ inputs.pr_number }} (commit ${{ needs.build-pr-wheel.outputs.commit_hash }})"
|
||||||
|
git push
|
||||||
31
third_party/sglang/.github/workflows/release-pypi.yml
vendored
Normal file
31
third_party/sglang/.github/workflows/release-pypi.yml
vendored
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
name: Release PyPI
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
tags:
|
||||||
|
- 'v[0-9]+.*'
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
publish:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: "prod"
|
||||||
|
steps:
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v4
|
||||||
|
with:
|
||||||
|
python-version: "3.10"
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0 # Required for setuptools-scm to determine version from tags
|
||||||
|
|
||||||
|
- name: Upload to pypi
|
||||||
|
run: |
|
||||||
|
cd python
|
||||||
|
cp ../README.md ../LICENSE .
|
||||||
|
pip install build wheel setuptools setuptools-scm
|
||||||
|
python3 -m build
|
||||||
|
pip install twine
|
||||||
|
python3 -m twine upload dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }}
|
||||||
68
third_party/sglang/.github/workflows/release-tag.yml
vendored
Normal file
68
third_party/sglang/.github/workflows/release-tag.yml
vendored
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
name: Release Tag
|
||||||
|
# Creates a git tag to trigger release workflows (PyPI, Docker)
|
||||||
|
# Use this after testing on a release branch is complete
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
version:
|
||||||
|
description: 'Version to tag (without v prefix, e.g., 0.5.7)'
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
ref:
|
||||||
|
description: 'Branch or commit to tag (e.g., release/v0.5.7, main, or commit SHA)'
|
||||||
|
required: false
|
||||||
|
default: 'main'
|
||||||
|
type: string
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
create-tag:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
environment: 'prod'
|
||||||
|
steps:
|
||||||
|
- name: Validate version format
|
||||||
|
run: |
|
||||||
|
VERSION="${{ github.event.inputs.version }}"
|
||||||
|
if [ -z "$VERSION" ]; then
|
||||||
|
echo "::error::Version is required"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+'; then
|
||||||
|
echo "::error::Invalid version format: $VERSION (expected: X.Y.Z or X.Y.Z.postN)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "Version validated: v$VERSION"
|
||||||
|
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ github.event.inputs.ref }}
|
||||||
|
fetch-depth: 0
|
||||||
|
token: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Check if tag already exists
|
||||||
|
run: |
|
||||||
|
TAG="v${{ github.event.inputs.version }}"
|
||||||
|
if git rev-parse "$TAG" >/dev/null 2>&1; then
|
||||||
|
echo "::error::Tag $TAG already exists"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "Tag $TAG does not exist, proceeding..."
|
||||||
|
|
||||||
|
- name: Create and push tag
|
||||||
|
run: |
|
||||||
|
TAG="v${{ github.event.inputs.version }}"
|
||||||
|
REF="${{ github.event.inputs.ref }}"
|
||||||
|
|
||||||
|
git config user.name "sglang-bot"
|
||||||
|
git config user.email "sglang-bot@users.noreply.github.com"
|
||||||
|
|
||||||
|
echo "Creating tag $TAG on ref $REF (commit: $(git rev-parse HEAD))"
|
||||||
|
git tag -a "$TAG" -m "Release $TAG"
|
||||||
|
git push origin "$TAG"
|
||||||
|
|
||||||
|
echo "::notice::Successfully created and pushed tag $TAG"
|
||||||
|
echo "This will trigger the release workflows (PyPI, Docker)"
|
||||||
440
third_party/sglang/.github/workflows/release-whl-kernel.yml
vendored
Normal file
440
third_party/sglang/.github/workflows/release-whl-kernel.yml
vendored
Normal file
@@ -0,0 +1,440 @@
|
|||||||
|
name: Release SGLang Kernels
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- sgl-kernel/python/sgl_kernel/version.py
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
target:
|
||||||
|
type: choice
|
||||||
|
description: 'Build target'
|
||||||
|
required: false
|
||||||
|
default: 'all'
|
||||||
|
options:
|
||||||
|
- 'all'
|
||||||
|
- 'cu129'
|
||||||
|
- 'cu130'
|
||||||
|
- 'rocm700'
|
||||||
|
- 'rocm720'
|
||||||
|
- 'musa43'
|
||||||
|
tag_name:
|
||||||
|
description: "Version number, must be in the form of vX.Y.Z (e.g. v0.4.0)"
|
||||||
|
type: string
|
||||||
|
required: false
|
||||||
|
pr_number:
|
||||||
|
description: "PR number to build from (e.g. 12345)"
|
||||||
|
type: string
|
||||||
|
required: false
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: release-sglang-kernels-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-cu129-matrix:
|
||||||
|
if: |
|
||||||
|
github.repository == 'sgl-project/sglang' &&
|
||||||
|
(github.event_name == 'push' || github.event.inputs.target == 'all' || github.event.inputs.target == 'cu129')
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
python-version: ["3.10"]
|
||||||
|
cuda-version: ["12.9"]
|
||||||
|
arch: [x86_64, aarch64]
|
||||||
|
include:
|
||||||
|
- arch: x86_64
|
||||||
|
runner: x64-kernel-build-node
|
||||||
|
- arch: aarch64
|
||||||
|
runner: arm-kernel-build-node
|
||||||
|
runs-on: ${{ matrix.runner }}
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
submodules: "recursive"
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Set up Python ${{ matrix.python-version }}
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: ${{ matrix.python-version }}
|
||||||
|
|
||||||
|
- name: Build wheels
|
||||||
|
run: |
|
||||||
|
cd sgl-kernel
|
||||||
|
chmod +x ./build.sh
|
||||||
|
./build.sh "${{ matrix.python-version }}" "${{ matrix.cuda-version }}" ${{ matrix.arch == 'aarch64' && 'aarch64' || '' }}
|
||||||
|
env:
|
||||||
|
BUILD_JOBS: 64
|
||||||
|
NVCC_THREADS: 8
|
||||||
|
|
||||||
|
- name: Upload to PyPI
|
||||||
|
working-directory: sgl-kernel
|
||||||
|
run: |
|
||||||
|
pip install twine
|
||||||
|
python3 -m twine upload --skip-existing dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN_SGLANG_KERNEL }}
|
||||||
|
|
||||||
|
- name: Upload artifacts
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: wheel-python${{ matrix.python-version }}-cuda${{ matrix.cuda-version }}${{ matrix.arch == 'aarch64' && '-aarch64' || '' }}
|
||||||
|
path: sgl-kernel/dist/*
|
||||||
|
|
||||||
|
release-cu129:
|
||||||
|
needs: build-cu129-matrix
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-*
|
||||||
|
|
||||||
|
- name: Set tag name
|
||||||
|
id: set_tag_name
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.tag_name }}" ]; then
|
||||||
|
TAG_NAME="v$(cat sgl-kernel/python/sgl_kernel/version.py | cut -d'"' -f2)"
|
||||||
|
echo "tag_name=$TAG_NAME" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "tag_name=${{ inputs.tag_name }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Release
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: ${{ steps.set_tag_name.outputs.tag_name }}
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
files: |
|
||||||
|
sgl-kernel/dist/*
|
||||||
|
|
||||||
|
- name: Clone wheel index
|
||||||
|
run: git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: python3 scripts/update_kernel_whl_index.py --cuda 129
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
git add -A
|
||||||
|
git commit -m "update whl index"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# for now we do not release CUDA 13.0 wheels to pypi
|
||||||
|
build-cu130-matrix:
|
||||||
|
if: |
|
||||||
|
github.repository == 'sgl-project/sglang' &&
|
||||||
|
(github.event_name == 'push' || github.event.inputs.target == 'all' || github.event.inputs.target == 'cu130')
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
python-version: ["3.10"]
|
||||||
|
cuda-version: ["13.0"]
|
||||||
|
arch: [x86_64, aarch64]
|
||||||
|
include:
|
||||||
|
- arch: x86_64
|
||||||
|
runner: x64-kernel-build-node
|
||||||
|
- arch: aarch64
|
||||||
|
runner: arm-kernel-build-node
|
||||||
|
runs-on: ${{ matrix.runner }}
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
submodules: "recursive"
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Set up Python ${{ matrix.python-version }}
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: ${{ matrix.python-version }}
|
||||||
|
|
||||||
|
- name: Build wheels
|
||||||
|
run: |
|
||||||
|
cd sgl-kernel
|
||||||
|
chmod +x ./build.sh
|
||||||
|
./build.sh "${{ matrix.python-version }}" "${{ matrix.cuda-version }}" ${{ matrix.arch == 'aarch64' && 'aarch64' || '' }}
|
||||||
|
env:
|
||||||
|
BUILD_JOBS: 64
|
||||||
|
NVCC_THREADS: 8
|
||||||
|
|
||||||
|
- name: Upload artifacts
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: wheel-python${{ matrix.python-version }}-cuda${{ matrix.cuda-version }}${{ matrix.arch == 'aarch64' && '-aarch64' || '' }}
|
||||||
|
path: sgl-kernel/dist/*
|
||||||
|
|
||||||
|
release-cu130:
|
||||||
|
needs: build-cu130-matrix
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-*
|
||||||
|
|
||||||
|
- name: Set tag name
|
||||||
|
id: set_tag_name
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.tag_name }}" ]; then
|
||||||
|
TAG_NAME="v$(cat sgl-kernel/python/sgl_kernel/version.py | cut -d'"' -f2)"
|
||||||
|
echo "tag_name=$TAG_NAME" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "tag_name=${{ inputs.tag_name }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Release
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: ${{ steps.set_tag_name.outputs.tag_name }}
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
files: |
|
||||||
|
sgl-kernel/dist/*
|
||||||
|
|
||||||
|
- name: Clone wheel index
|
||||||
|
run: git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: python3 scripts/update_kernel_whl_index.py --cuda 130
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
git add -A
|
||||||
|
git commit -m "update whl index"
|
||||||
|
git push
|
||||||
|
|
||||||
|
build-rocm-matrix:
|
||||||
|
if: |
|
||||||
|
github.repository == 'sgl-project/sglang' &&
|
||||||
|
(github.event_name == 'push' || github.event.inputs.target == 'all' || github.event.inputs.target == 'rocm700' || github.event.inputs.target == 'rocm720')
|
||||||
|
runs-on: amd-docker-scale
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
python-version: ["3.10"]
|
||||||
|
rocm-version: ["700", "720"]
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
submodules: "recursive"
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Set up Python ${{ matrix.python-version }}
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: ${{ matrix.python-version }}
|
||||||
|
|
||||||
|
- name: Build wheels
|
||||||
|
run: |
|
||||||
|
cp 3rdparty/amd/wheel/sgl-kernel/* sgl-kernel/
|
||||||
|
cd sgl-kernel
|
||||||
|
chmod +x ./build_rocm.sh
|
||||||
|
./build_rocm.sh "${{ matrix.rocm-version }}"
|
||||||
|
|
||||||
|
- name: Upload artifacts
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: wheel-python${{ matrix.python-version }}-rocm${{ matrix.rocm-version }}
|
||||||
|
path: sgl-kernel/dist/*
|
||||||
|
|
||||||
|
release-rocm700:
|
||||||
|
needs: build-rocm-matrix
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-*-rocm700
|
||||||
|
|
||||||
|
- name: Set tag name
|
||||||
|
id: set_tag_name
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.tag_name }}" ]; then
|
||||||
|
TAG_NAME="v$(cat sgl-kernel/python/sgl_kernel/version.py | cut -d'"' -f2)"
|
||||||
|
echo "tag_name=$TAG_NAME" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "tag_name=${{ inputs.tag_name }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Release
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: ${{ steps.set_tag_name.outputs.tag_name }}
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
files: |
|
||||||
|
sgl-kernel/dist/*
|
||||||
|
|
||||||
|
- name: Clone wheel index
|
||||||
|
run: git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: python3 scripts/update_kernel_whl_index.py --rocm 700
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
git add -A
|
||||||
|
git commit -m "update whl index"
|
||||||
|
git push
|
||||||
|
|
||||||
|
release-rocm720:
|
||||||
|
needs: build-rocm-matrix
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_number && format('refs/pull/{0}/head', inputs.pr_number) || '' }}
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-*-rocm720
|
||||||
|
|
||||||
|
- name: Set tag name
|
||||||
|
id: set_tag_name
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.tag_name }}" ]; then
|
||||||
|
TAG_NAME="v$(cat sgl-kernel/python/sgl_kernel/version.py | cut -d'"' -f2)"
|
||||||
|
echo "tag_name=$TAG_NAME" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "tag_name=${{ inputs.tag_name }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Release
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: ${{ steps.set_tag_name.outputs.tag_name }}
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
files: |
|
||||||
|
sgl-kernel/dist/*
|
||||||
|
|
||||||
|
- name: Clone wheel index
|
||||||
|
run: git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: python3 scripts/update_kernel_whl_index.py --rocm 720
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
git add -A
|
||||||
|
git commit -m "update whl index"
|
||||||
|
git push
|
||||||
|
|
||||||
|
build-musa43:
|
||||||
|
if: |
|
||||||
|
github.repository == 'sgl-project/sglang' &&
|
||||||
|
(github.event_name == 'push' || github.event.inputs.target == 'all' || github.event.inputs.target == 'musa43')
|
||||||
|
runs-on: kernel-build-node-musa
|
||||||
|
strategy:
|
||||||
|
matrix:
|
||||||
|
python-version: ["3.10"]
|
||||||
|
musa-version: ["43"]
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
submodules: "recursive"
|
||||||
|
|
||||||
|
- name: Build wheels
|
||||||
|
run: |
|
||||||
|
cd sgl-kernel
|
||||||
|
mv pyproject_musa.toml pyproject.toml
|
||||||
|
python setup_musa.py sdist bdist_wheel
|
||||||
|
|
||||||
|
- name: Rename MUSA wheels
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/musa/rename_wheels_musa.sh ${{ matrix.musa-version }} sgl-kernel/dist
|
||||||
|
|
||||||
|
- name: Upload artifacts
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: wheel-python${{ matrix.python-version }}-musa${{ matrix.musa-version }}
|
||||||
|
path: sgl-kernel/dist/*
|
||||||
|
|
||||||
|
release-musa43:
|
||||||
|
needs: build-musa43
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Download artifacts
|
||||||
|
uses: actions/download-artifact@v4
|
||||||
|
with:
|
||||||
|
path: sgl-kernel/dist/
|
||||||
|
merge-multiple: true
|
||||||
|
pattern: wheel-*
|
||||||
|
|
||||||
|
- name: Set tag name
|
||||||
|
id: set_tag_name
|
||||||
|
run: |
|
||||||
|
if [ -z "${{ inputs.tag_name }}" ]; then
|
||||||
|
TAG_NAME="v$(cat sgl-kernel/python/sgl_kernel/version.py | cut -d'"' -f2)"
|
||||||
|
echo "tag_name=$TAG_NAME" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "tag_name=${{ inputs.tag_name }}" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Release
|
||||||
|
uses: softprops/action-gh-release@v2
|
||||||
|
with:
|
||||||
|
tag_name: ${{ steps.set_tag_name.outputs.tag_name }}
|
||||||
|
repository: sgl-project/whl
|
||||||
|
token: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
files: |
|
||||||
|
sgl-kernel/dist/*
|
||||||
|
|
||||||
|
- name: Clone wheel index
|
||||||
|
run: git clone https://oauth2:${WHL_TOKEN}@github.com/sgl-project/whl.git sgl-whl
|
||||||
|
env:
|
||||||
|
WHL_TOKEN: ${{ secrets.GH_PAT_FOR_WHL_RELEASE }}
|
||||||
|
|
||||||
|
- name: Update wheel index
|
||||||
|
run: python3 scripts/update_kernel_whl_index.py --musa 43
|
||||||
|
|
||||||
|
- name: Push wheel index
|
||||||
|
run: |
|
||||||
|
cd sgl-whl
|
||||||
|
git config --local user.name "sglang-bot"
|
||||||
|
git config --local user.email "sglangbot@gmail.com"
|
||||||
|
git add -A
|
||||||
|
git commit -m "update whl index"
|
||||||
|
git push
|
||||||
136
third_party/sglang/.github/workflows/rerun-test.yml
vendored
Normal file
136
third_party/sglang/.github/workflows/rerun-test.yml
vendored
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
name: Rerun Test
|
||||||
|
run-name: ${{ inputs.pr_head_sha && format('[rerun-test] {0} {1}', inputs.test_command, inputs.pr_head_sha) || format('[rerun-test] {0}', inputs.test_command) }}
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
test_command:
|
||||||
|
description: "Test command(s) to run, one per line (e.g. 'registered/core/test_srt_endpoint.py TestSRTEndpoint.test_simple_decode')"
|
||||||
|
required: true
|
||||||
|
type: string
|
||||||
|
runner_label:
|
||||||
|
description: "Runner label"
|
||||||
|
required: true
|
||||||
|
type: choice
|
||||||
|
options:
|
||||||
|
- 1-gpu-h100
|
||||||
|
- 1-gpu-5090
|
||||||
|
- 2-gpu-h100
|
||||||
|
- 4-gpu-h100
|
||||||
|
- 4-gpu-a10
|
||||||
|
- 4-gpu-b200
|
||||||
|
- 8-gpu-h200
|
||||||
|
- 8-gpu-h20
|
||||||
|
- 8-gpu-b200
|
||||||
|
- ubuntu-latest
|
||||||
|
pr_head_sha:
|
||||||
|
description: "PR head SHA to checkout (for /rerun-test on fork PRs)"
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: ""
|
||||||
|
use_deepep:
|
||||||
|
description: "Use ci_install_deepep.sh instead of ci_install_dependency.sh"
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: "false"
|
||||||
|
is_cpu:
|
||||||
|
description: "Run as CPU-only test (uses ubuntu-latest with uv pip install)"
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
default: "false"
|
||||||
|
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
SGLANG_CUDA_COREDUMP: "1"
|
||||||
|
SGLANG_JIT_DEEPGEMM_FAST_WARMUP: true
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
actions: write
|
||||||
|
contents: read
|
||||||
|
issues: read
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
rerun-test-cuda:
|
||||||
|
if: inputs.is_cpu != 'true'
|
||||||
|
runs-on: ${{ inputs.runner_label }}
|
||||||
|
timeout-minutes: 120
|
||||||
|
env:
|
||||||
|
RUNNER_LABELS: ${{ inputs.runner_label }}
|
||||||
|
SGLANG_CI_RDMA_ALL_DEVICES: ${{ inputs.runner_label == '8-gpu-h20' && 'mlx5_1,mlx5_2,mlx5_3,mlx5_4' || '' }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
run: |
|
||||||
|
if [[ "${{ inputs.runner_label }}" == "1-gpu-5090" ]]; then
|
||||||
|
source /etc/profile.d/sglang-ci.sh
|
||||||
|
fi
|
||||||
|
if [[ "${{ inputs.use_deepep }}" == "true" ]]; then
|
||||||
|
bash scripts/ci/cuda/ci_install_deepep.sh
|
||||||
|
else
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
run: |
|
||||||
|
if [[ "${{ inputs.runner_label }}" == "1-gpu-5090" ]]; then
|
||||||
|
source /etc/profile.d/sglang-ci.sh
|
||||||
|
fi
|
||||||
|
cd test/
|
||||||
|
echo "${{ inputs.test_command }}" | while IFS= read -r cmd; do
|
||||||
|
[ -z "$cmd" ] && continue
|
||||||
|
echo ">>> Running: python3 $cmd"
|
||||||
|
python3 $cmd || exit 1
|
||||||
|
done
|
||||||
|
|
||||||
|
- uses: ./.github/actions/upload-cuda-coredumps
|
||||||
|
if: always()
|
||||||
|
|
||||||
|
rerun-test-cpu:
|
||||||
|
if: inputs.is_cpu == 'true'
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
timeout-minutes: 120
|
||||||
|
steps:
|
||||||
|
- name: Free disk space
|
||||||
|
run: |
|
||||||
|
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc
|
||||||
|
df -h
|
||||||
|
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ inputs.pr_head_sha || github.sha }}
|
||||||
|
|
||||||
|
- uses: ./.github/actions/check-maintenance
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install uv
|
||||||
|
uses: astral-sh/setup-uv@v5
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
timeout-minutes: 20
|
||||||
|
env:
|
||||||
|
UV_SYSTEM_PYTHON: "1"
|
||||||
|
run: |
|
||||||
|
uv pip install -e "python[dev]" --index-strategy unsafe-best-match --prerelease allow
|
||||||
|
|
||||||
|
- name: Run test
|
||||||
|
timeout-minutes: 60
|
||||||
|
run: |
|
||||||
|
cd test/
|
||||||
|
echo "${{ inputs.test_command }}" | while IFS= read -r cmd; do
|
||||||
|
[ -z "$cmd" ] && continue
|
||||||
|
echo ">>> Running: python3 $cmd"
|
||||||
|
python3 $cmd || exit 1
|
||||||
|
done
|
||||||
30
third_party/sglang/.github/workflows/retag-docker.yml
vendored
Normal file
30
third_party/sglang/.github/workflows/retag-docker.yml
vendored
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
name: Retag Docker Image
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
source_tag:
|
||||||
|
description: "Existing image tag (e.g., v0.4.7-cu129-amd64)"
|
||||||
|
required: true
|
||||||
|
target_tag:
|
||||||
|
description: "New tag to apply (e.g., latest)"
|
||||||
|
required: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
retag:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
environment: "prod"
|
||||||
|
steps:
|
||||||
|
- name: Login to Docker Hub
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||||
|
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Retag image
|
||||||
|
run: |
|
||||||
|
echo "Retagging lmsysorg/sglang:${{ inputs.source_tag }} -> lmsysorg/sglang:${{ inputs.target_tag }}"
|
||||||
|
docker buildx imagetools create \
|
||||||
|
-t lmsysorg/sglang:${{ inputs.target_tag }} \
|
||||||
|
lmsysorg/sglang:${{ inputs.source_tag }}
|
||||||
43
third_party/sglang/.github/workflows/runner-utilization.yml
vendored
Normal file
43
third_party/sglang/.github/workflows/runner-utilization.yml
vendored
Normal file
@@ -0,0 +1,43 @@
|
|||||||
|
name: Runner Utilization Report
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 8 * * *' # Daily at 8 AM UTC
|
||||||
|
pull_request:
|
||||||
|
paths:
|
||||||
|
- '.github/workflows/runner-utilization.yml'
|
||||||
|
- 'scripts/ci/utils/runner_utilization_report.py'
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
hours:
|
||||||
|
description: 'Time window in hours'
|
||||||
|
required: false
|
||||||
|
default: '24'
|
||||||
|
type: string
|
||||||
|
filter:
|
||||||
|
description: 'Filter runner labels (e.g., 5090, h200)'
|
||||||
|
required: false
|
||||||
|
type: string
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
report:
|
||||||
|
name: Generate Report
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Generate Utilization Report
|
||||||
|
timeout-minutes: 30
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/runner_utilization_report.py \
|
||||||
|
--repo ${{ github.repository }} \
|
||||||
|
--hours ${{ inputs.hours || '24' }} \
|
||||||
|
${{ inputs.filter && format('--filter {0}', inputs.filter) || '' }}
|
||||||
99
third_party/sglang/.github/workflows/slash-command-handler.yml
vendored
Normal file
99
third_party/sglang/.github/workflows/slash-command-handler.yml
vendored
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
name: Slash Command Handler
|
||||||
|
|
||||||
|
on:
|
||||||
|
issue_comment:
|
||||||
|
types: [created, edited]
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
pull-requests: write # Required to add labels and reactions
|
||||||
|
actions: write # Required to rerun workflows
|
||||||
|
issues: write # Required for comment reactions in some contexts
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
slash_command:
|
||||||
|
# Only run if it is a PR and the comment contains a recognized command
|
||||||
|
# Use contains() since startsWith() can't handle leading whitespace/newlines
|
||||||
|
if: >
|
||||||
|
github.event.issue.pull_request &&
|
||||||
|
(contains(github.event.comment.body, '/tag-run-ci-label') ||
|
||||||
|
contains(github.event.comment.body, '/rerun-failed-ci') ||
|
||||||
|
contains(github.event.comment.body, '/tag-and-rerun-ci') ||
|
||||||
|
contains(github.event.comment.body, '/rerun-stage') ||
|
||||||
|
contains(github.event.comment.body, '/rerun-test'))
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
|
||||||
|
steps:
|
||||||
|
# SECURITY: This workflow runs on issue_comment trigger with elevated permissions
|
||||||
|
# (pull-requests: write, actions: write). For non-fork PRs, we can safely checkout
|
||||||
|
# the PR branch to allow testing changes to this handler. For fork PRs, we MUST
|
||||||
|
# stay on main to prevent untrusted code execution with these elevated permissions.
|
||||||
|
- name: Get PR details
|
||||||
|
id: pr
|
||||||
|
shell: bash
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
run: |
|
||||||
|
PR_DATA=$(gh pr view ${{ github.event.issue.number }} --repo ${{ github.repository }} --json headRefName,headRepositoryOwner) || {
|
||||||
|
echo "::error::Failed to fetch PR data"
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
# Use 'empty' filter to handle null/missing values (e.g., deleted forks)
|
||||||
|
HEAD_OWNER=$(echo "$PR_DATA" | jq -r '.headRepositoryOwner.login // empty')
|
||||||
|
REPO_OWNER="${{ github.repository_owner }}"
|
||||||
|
# Treat missing/null owner as fork for security (fail-safe)
|
||||||
|
if [[ -z "$HEAD_OWNER" || "$HEAD_OWNER" != "$REPO_OWNER" ]]; then
|
||||||
|
IS_FORK="true"
|
||||||
|
else
|
||||||
|
IS_FORK="false"
|
||||||
|
fi
|
||||||
|
echo "is_fork=$IS_FORK" >> $GITHUB_OUTPUT
|
||||||
|
echo "ref=$(echo "$PR_DATA" | jq -r '.headRefName')" >> $GITHUB_OUTPUT
|
||||||
|
echo "pr_ref=refs/pull/${{ github.event.issue.number }}/head" >> $GITHUB_OUTPUT
|
||||||
|
echo "PR owner: $HEAD_OWNER, Repo owner: $REPO_OWNER, Is fork: $IS_FORK"
|
||||||
|
|
||||||
|
- name: Check commenter permission for fork PRs
|
||||||
|
id: perm
|
||||||
|
if: steps.pr.outputs.is_fork == 'true'
|
||||||
|
shell: bash
|
||||||
|
env:
|
||||||
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
run: |
|
||||||
|
PERM=$(gh api repos/${{ github.repository }}/collaborators/${{ github.event.comment.user.login }}/permission --jq '.permission') || {
|
||||||
|
PERM="none"
|
||||||
|
echo "::warning::Failed to check commenter permission, defaulting to none"
|
||||||
|
}
|
||||||
|
if [[ "$PERM" == "admin" || "$PERM" == "maintain" || "$PERM" == "write" ]]; then
|
||||||
|
echo "safe_to_checkout_pr=true" >> $GITHUB_OUTPUT
|
||||||
|
else
|
||||||
|
echo "safe_to_checkout_pr=false" >> $GITHUB_OUTPUT
|
||||||
|
fi
|
||||||
|
echo "Commenter ${{ github.event.comment.user.login }} permission: $PERM"
|
||||||
|
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
# For non-fork PRs: checkout PR branch by name
|
||||||
|
# For fork PRs with trusted commenter: checkout via refs/pull/N/head
|
||||||
|
# For fork PRs with untrusted commenter: stay on main for security
|
||||||
|
ref: ${{ steps.pr.outputs.is_fork == 'false' && steps.pr.outputs.ref || (steps.perm.outputs.safe_to_checkout_pr == 'true' && steps.pr.outputs.pr_ref || '') }}
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
pip install PyGithub
|
||||||
|
|
||||||
|
- name: Handle Slash Command
|
||||||
|
env:
|
||||||
|
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
REPO_FULL_NAME: ${{ github.repository }}
|
||||||
|
PR_NUMBER: ${{ github.event.issue.number }}
|
||||||
|
COMMENT_ID: ${{ github.event.comment.id }}
|
||||||
|
COMMENT_BODY: ${{ github.event.comment.body }}
|
||||||
|
USER_LOGIN: ${{ github.event.comment.user.login }}
|
||||||
|
run: |
|
||||||
|
python scripts/ci/utils/slash_command_handler.py
|
||||||
44
third_party/sglang/.github/workflows/stress-test.yml
vendored
Normal file
44
third_party/sglang/.github/workflows/stress-test.yml
vendored
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
name: Stress Test
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
num_prompts:
|
||||||
|
description: 'Number of prompts per model'
|
||||||
|
required: true
|
||||||
|
default: '50000'
|
||||||
|
type: string
|
||||||
|
duration_minutes:
|
||||||
|
description: 'Timeout per model in minutes'
|
||||||
|
required: true
|
||||||
|
default: '45'
|
||||||
|
type: string
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
stress-test:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: 8-gpu-h200
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run stress tests
|
||||||
|
timeout-minutes: 210
|
||||||
|
env:
|
||||||
|
NUM_PROMPTS: ${{ inputs.num_prompts }}
|
||||||
|
DURATION_MINUTES: ${{ inputs.duration_minutes }}
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite stress
|
||||||
|
|
||||||
|
- name: Upload results
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: stress-test-results
|
||||||
|
path: |
|
||||||
|
stress_test_*.jsonl
|
||||||
85
third_party/sglang/.github/workflows/trivy-scan-dev.yml
vendored
Normal file
85
third_party/sglang/.github/workflows/trivy-scan-dev.yml
vendored
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
name: Trivy Scan Dev Docker Images
|
||||||
|
|
||||||
|
on:
|
||||||
|
# Run daily after nightly dev builds (which run at midnight UTC)
|
||||||
|
schedule:
|
||||||
|
- cron: "0 6 * * *"
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
tag:
|
||||||
|
description: "Image tag to scan (e.g., dev, dev-cu13, latest)"
|
||||||
|
required: false
|
||||||
|
default: ""
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
scan:
|
||||||
|
if: github.repository == 'sgl-project/sglang'
|
||||||
|
runs-on: x64-docker-build-node
|
||||||
|
timeout-minutes: 45
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
security-events: write
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
matrix:
|
||||||
|
tag: ${{ inputs.tag && fromJSON(format('["{0}"]', inputs.tag)) || fromJSON('["dev", "dev-cu13"]') }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Run Trivy vulnerability scanner
|
||||||
|
uses: aquasecurity/trivy-action@v0.35.0
|
||||||
|
with:
|
||||||
|
image-ref: 'docker.io/lmsysorg/sglang:${{ matrix.tag }}'
|
||||||
|
scanners: 'vuln'
|
||||||
|
format: 'sarif'
|
||||||
|
output: 'trivy-results-${{ matrix.tag }}.sarif'
|
||||||
|
severity: 'CRITICAL,HIGH'
|
||||||
|
ignore-unfixed: true
|
||||||
|
skip-dirs: 'usr/local/go,opt/nvidia'
|
||||||
|
|
||||||
|
- name: Upload Trivy scan results to GitHub Security
|
||||||
|
uses: github/codeql-action/upload-sarif@v4
|
||||||
|
if: always() && hashFiles(format('trivy-results-{0}.sarif', matrix.tag)) != ''
|
||||||
|
with:
|
||||||
|
sarif_file: 'trivy-results-${{ matrix.tag }}.sarif'
|
||||||
|
category: 'trivy-${{ matrix.tag }}'
|
||||||
|
|
||||||
|
- name: Run Trivy (table output for logs)
|
||||||
|
if: success()
|
||||||
|
uses: aquasecurity/trivy-action@v0.35.0
|
||||||
|
with:
|
||||||
|
image-ref: 'docker.io/lmsysorg/sglang:${{ matrix.tag }}'
|
||||||
|
scanners: 'vuln'
|
||||||
|
format: 'table'
|
||||||
|
severity: 'CRITICAL,HIGH'
|
||||||
|
ignore-unfixed: true
|
||||||
|
skip-dirs: 'usr/local/go,opt/nvidia'
|
||||||
|
|
||||||
|
- name: Scan summary
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
IMAGE="docker.io/lmsysorg/sglang:${{ matrix.tag }}"
|
||||||
|
SARIF="trivy-results-${{ matrix.tag }}.sarif"
|
||||||
|
|
||||||
|
echo "## Trivy Scan: \`${{ matrix.tag }}\`" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
|
||||||
|
if [ ! -f "${SARIF}" ]; then
|
||||||
|
echo "**Status:** Scan failed — no SARIF output produced" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
VULN_COUNT=$(python3 -c "
|
||||||
|
import json
|
||||||
|
data = json.load(open('${SARIF}'))
|
||||||
|
print(sum(len(run.get('results', [])) for run in data.get('runs', [])))
|
||||||
|
")
|
||||||
|
|
||||||
|
echo "- **Image**: \`${IMAGE}\`" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
echo "- **Findings**: ${VULN_COUNT}" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
|
||||||
|
if [ "${VULN_COUNT}" = "0" ]; then
|
||||||
|
echo "- **Result**: No CRITICAL/HIGH unfixed vulnerabilities found" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
else
|
||||||
|
echo "- **Result**: Found ${VULN_COUNT} finding(s) — check the Security tab for details" >> "$GITHUB_STEP_SUMMARY"
|
||||||
|
fi
|
||||||
49
third_party/sglang/.github/workflows/weekly-test-nvidia.yml
vendored
Normal file
49
third_party/sglang/.github/workflows/weekly-test-nvidia.yml
vendored
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
name: Weekly Test (Nvidia)
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 0 * * 0' # Run every Sunday at midnight UTC
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
job_filter:
|
||||||
|
description: 'Select which job to run (leave empty or "all" to run all jobs)'
|
||||||
|
required: false
|
||||||
|
type: choice
|
||||||
|
default: 'all'
|
||||||
|
options:
|
||||||
|
- 'all'
|
||||||
|
- 'weekly-test-8-gpu-h200'
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: weekly-test-nvidia-${{ github.ref }}
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
SGLANG_IS_IN_CI: true
|
||||||
|
HF_HUB_DOWNLOAD_TIMEOUT: 300
|
||||||
|
HF_HUB_ETAG_TIMEOUT: 300
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
# Weekly tests - 8 GPU H200
|
||||||
|
weekly-test-8-gpu-h200:
|
||||||
|
if: github.repository == 'sgl-project/sglang' && (inputs.job_filter == '' || inputs.job_filter == 'all' || inputs.job_filter == 'weekly-test-8-gpu-h200')
|
||||||
|
runs-on: 8-gpu-h200
|
||||||
|
timeout-minutes: 120
|
||||||
|
env:
|
||||||
|
RUNNER_LABELS: 8-gpu-h200
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
bash scripts/ci/cuda/ci_install_dependency.sh
|
||||||
|
|
||||||
|
- name: Run weekly 8-GPU H200 tests
|
||||||
|
timeout-minutes: 120
|
||||||
|
env:
|
||||||
|
GPU_CONFIG: "8-gpu-h200"
|
||||||
|
IS_H200: "1"
|
||||||
|
run: |
|
||||||
|
cd test
|
||||||
|
python3 run_suite.py --hw cuda --suite weekly-8-gpu-h200 --nightly --continue-on-error --timeout-per-file 7200
|
||||||
274
third_party/sglang/.gitignore
vendored
Normal file
274
third_party/sglang/.gitignore
vendored
Normal file
@@ -0,0 +1,274 @@
|
|||||||
|
# Byte-compiled / optimized / DLL files
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
|
||||||
|
# C extensions
|
||||||
|
*.so
|
||||||
|
|
||||||
|
# Distribution / packaging
|
||||||
|
.Python
|
||||||
|
**/build/
|
||||||
|
**/develop-eggs/
|
||||||
|
**/dist/
|
||||||
|
**/downloads/
|
||||||
|
**/eggs/
|
||||||
|
.eggs/
|
||||||
|
**/lib/
|
||||||
|
**/lib64/
|
||||||
|
**/parts/
|
||||||
|
**/sdist/
|
||||||
|
**/var/
|
||||||
|
**/wheels/
|
||||||
|
**/share/python-wheels/
|
||||||
|
*.egg-info/
|
||||||
|
.installed.cfg
|
||||||
|
*.egg
|
||||||
|
MANIFEST
|
||||||
|
|
||||||
|
# PyInstaller
|
||||||
|
# Usually these files are written by a python script from a template
|
||||||
|
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||||
|
*.manifest
|
||||||
|
*.spec
|
||||||
|
|
||||||
|
# Installer logs
|
||||||
|
pip-log.txt
|
||||||
|
pip-delete-this-directory.txt
|
||||||
|
|
||||||
|
# Unit test / coverage reports
|
||||||
|
htmlcov/
|
||||||
|
.tox/
|
||||||
|
.nox/
|
||||||
|
.coverage
|
||||||
|
.coverage.*
|
||||||
|
.cache
|
||||||
|
nosetests.xml
|
||||||
|
coverage.xml
|
||||||
|
*.cover
|
||||||
|
*.py,cover
|
||||||
|
.hypothesis/
|
||||||
|
|
||||||
|
# Tokenizer cache for tests
|
||||||
|
.tokenizer_cache/
|
||||||
|
.pytest_cache/
|
||||||
|
cover/
|
||||||
|
|
||||||
|
# Translations
|
||||||
|
*.mo
|
||||||
|
*.pot
|
||||||
|
|
||||||
|
# Django stuff:
|
||||||
|
*.log
|
||||||
|
local_settings.py
|
||||||
|
db.sqlite3
|
||||||
|
db.sqlite3-journal
|
||||||
|
|
||||||
|
# Flask stuff:
|
||||||
|
instance/
|
||||||
|
.webassets-cache
|
||||||
|
|
||||||
|
# Scrapy stuff:
|
||||||
|
.scrapy
|
||||||
|
|
||||||
|
# Sphinx documentation
|
||||||
|
docs/_build/
|
||||||
|
|
||||||
|
# PyBuilder
|
||||||
|
.pybuilder/
|
||||||
|
target/
|
||||||
|
|
||||||
|
# Jupyter Notebook
|
||||||
|
.ipynb_checkpoints
|
||||||
|
|
||||||
|
# IPython
|
||||||
|
profile_default/
|
||||||
|
ipython_config.py
|
||||||
|
|
||||||
|
# pyenv
|
||||||
|
# For a library or package, you might want to ignore these files since the code is
|
||||||
|
# intended to run in multiple environments; otherwise, check them in:
|
||||||
|
# .python-version
|
||||||
|
|
||||||
|
# pipenv
|
||||||
|
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
||||||
|
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
||||||
|
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
||||||
|
# install all needed dependencies.
|
||||||
|
#Pipfile.lock
|
||||||
|
|
||||||
|
# poetry
|
||||||
|
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
||||||
|
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
||||||
|
# commonly ignored for libraries.
|
||||||
|
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
||||||
|
#poetry.lock
|
||||||
|
|
||||||
|
# pdm
|
||||||
|
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
||||||
|
#pdm.lock
|
||||||
|
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
||||||
|
# in version control.
|
||||||
|
# https://pdm.fming.dev/#use-with-ide
|
||||||
|
.pdm.toml
|
||||||
|
|
||||||
|
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
||||||
|
__pypackages__/
|
||||||
|
|
||||||
|
# Celery stuff
|
||||||
|
celerybeat-schedule
|
||||||
|
celerybeat.pid
|
||||||
|
|
||||||
|
# SageMath parsed files
|
||||||
|
*.sage.py
|
||||||
|
|
||||||
|
# Environments
|
||||||
|
.env
|
||||||
|
.venv
|
||||||
|
env/
|
||||||
|
venv/
|
||||||
|
ENV/
|
||||||
|
env.bak/
|
||||||
|
venv.bak/
|
||||||
|
|
||||||
|
# Spyder project settings
|
||||||
|
.spyderproject
|
||||||
|
.spyproject
|
||||||
|
|
||||||
|
# Rope project settings
|
||||||
|
.ropeproject
|
||||||
|
|
||||||
|
# mkdocs documentation
|
||||||
|
/site
|
||||||
|
|
||||||
|
# mypy
|
||||||
|
.mypy_cache/
|
||||||
|
.dmypy.json
|
||||||
|
dmypy.json
|
||||||
|
|
||||||
|
# Pyre type checker
|
||||||
|
.pyre/
|
||||||
|
|
||||||
|
# pytype static type analyzer
|
||||||
|
.pytype/
|
||||||
|
|
||||||
|
# Cython debug symbols
|
||||||
|
cython_debug/
|
||||||
|
|
||||||
|
# PyCharm
|
||||||
|
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
||||||
|
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
||||||
|
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
||||||
|
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
||||||
|
.idea/
|
||||||
|
|
||||||
|
# MacOS
|
||||||
|
.DS_Store
|
||||||
|
|
||||||
|
# Vim
|
||||||
|
*.swp
|
||||||
|
|
||||||
|
# Documentation
|
||||||
|
docs/_build
|
||||||
|
|
||||||
|
# SGL
|
||||||
|
benchmark/mmlu/data
|
||||||
|
benchmark/mmlu/data.tar
|
||||||
|
benchmark/llava_bench/images
|
||||||
|
benchmark/llava_bench/mme_pack
|
||||||
|
*.jsonl
|
||||||
|
tmp*.txt
|
||||||
|
|
||||||
|
# Torch Compile logs
|
||||||
|
tl_out/
|
||||||
|
|
||||||
|
# Plots
|
||||||
|
*.png
|
||||||
|
*.pdf
|
||||||
|
|
||||||
|
# personnal
|
||||||
|
work_dirs/
|
||||||
|
*.csv
|
||||||
|
|
||||||
|
!logo.png
|
||||||
|
|
||||||
|
# Prerequisites
|
||||||
|
*.d
|
||||||
|
|
||||||
|
# Compiled Object files
|
||||||
|
*.slo
|
||||||
|
*.lo
|
||||||
|
*.o
|
||||||
|
*.obj
|
||||||
|
|
||||||
|
# Precompiled Headers
|
||||||
|
*.gch
|
||||||
|
*.pch
|
||||||
|
|
||||||
|
# Compiled Dynamic libraries
|
||||||
|
*.so
|
||||||
|
*.dylib
|
||||||
|
*.dll
|
||||||
|
|
||||||
|
# Fortran module files
|
||||||
|
*.mod
|
||||||
|
*.smod
|
||||||
|
|
||||||
|
# Compiled Static libraries
|
||||||
|
*.lai
|
||||||
|
*.la
|
||||||
|
*.a
|
||||||
|
*.lib
|
||||||
|
|
||||||
|
# Executables
|
||||||
|
*.exe
|
||||||
|
*.out
|
||||||
|
*.app
|
||||||
|
*.iml
|
||||||
|
|
||||||
|
# VSCode
|
||||||
|
.vscode
|
||||||
|
|
||||||
|
# Autoenv
|
||||||
|
.env.leave
|
||||||
|
|
||||||
|
# Rust lib
|
||||||
|
Cargo.lock
|
||||||
|
|
||||||
|
# Generated vision test fixtures (regenerate with: python scripts/generate_vision_golden.py)
|
||||||
|
sgl-model-gateway/tests/fixtures/golden/
|
||||||
|
|
||||||
|
# Other repos
|
||||||
|
lmms-eval
|
||||||
|
|
||||||
|
**/.serena/
|
||||||
|
ctags/
|
||||||
|
outputs/
|
||||||
|
inputs/
|
||||||
|
|
||||||
|
# Eval Cache
|
||||||
|
.longbench_cache/
|
||||||
|
|
||||||
|
# CUDA kernel develop, profile and debug
|
||||||
|
.clangd
|
||||||
|
*.nsys-rep
|
||||||
|
*.ncu-rep
|
||||||
|
*.nvcudmp
|
||||||
|
|
||||||
|
# setuptools-scm generated version file
|
||||||
|
python/sglang/_version.py
|
||||||
|
|
||||||
|
# MUSA section
|
||||||
|
# Generated source files by torchada
|
||||||
|
sgl-kernel/csrc_musa/
|
||||||
|
sgl-kernel/include_musa/
|
||||||
|
sgl-kernel/csrc/**/*_musa/
|
||||||
|
|
||||||
|
# MUSA core dump files
|
||||||
|
*.mudmp
|
||||||
|
|
||||||
|
# Others
|
||||||
|
# diffusion 3D outputs
|
||||||
|
*.glb
|
||||||
|
*.ply
|
||||||
|
*.npz
|
||||||
3
third_party/sglang/.isort.cfg
vendored
Normal file
3
third_party/sglang/.isort.cfg
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
[settings]
|
||||||
|
profile=black
|
||||||
|
known_first_party=sglang
|
||||||
102
third_party/sglang/.pre-commit-config.yaml
vendored
Normal file
102
third_party/sglang/.pre-commit-config.yaml
vendored
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
default_stages: [pre-commit, pre-push, manual]
|
||||||
|
exclude: ^(python/sglang/multimodal_gen/csrc|python/sglang/jit_kernel/flash_attention/cute)
|
||||||
|
|
||||||
|
repos:
|
||||||
|
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||||
|
rev: v6.0.0
|
||||||
|
hooks:
|
||||||
|
- id: check-symlinks
|
||||||
|
- id: destroyed-symlinks
|
||||||
|
- id: trailing-whitespace
|
||||||
|
- id: end-of-file-fixer
|
||||||
|
- id: check-yaml
|
||||||
|
args: [--allow-multiple-documents]
|
||||||
|
- id: check-toml
|
||||||
|
- id: check-ast
|
||||||
|
- id: check-added-large-files
|
||||||
|
- id: check-merge-conflict
|
||||||
|
- id: check-shebang-scripts-are-executable
|
||||||
|
- id: detect-private-key
|
||||||
|
exclude: ^sgl-model-gateway/tests/.*_test\.rs$
|
||||||
|
- id: debug-statements
|
||||||
|
- id: no-commit-to-branch
|
||||||
|
- repo: https://github.com/PyCQA/isort
|
||||||
|
rev: 7.0.0
|
||||||
|
hooks:
|
||||||
|
- id: isort
|
||||||
|
exclude: '^python/sglang/srt/grpc/.*_pb2\.py$|^python/sglang/srt/grpc/.*_pb2_grpc\.py$|^python/sglang/srt/grpc/.*_pb2\.pyi$|^python/sglang/srt/grpc/.*_pb2_grpc\.pyi$'
|
||||||
|
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||||
|
rev: v0.15.1
|
||||||
|
hooks:
|
||||||
|
- id: ruff
|
||||||
|
args:
|
||||||
|
- --select=F401,F821
|
||||||
|
- --fix
|
||||||
|
files: ^(benchmark/|docs/|examples/|python/sglang/|sgl-model-gateway/py_*|test/)
|
||||||
|
exclude: |
|
||||||
|
(?x)^(
|
||||||
|
.*/__init__\.py$|
|
||||||
|
.*\.ipynb$|
|
||||||
|
python/sglang/srt/grpc/.*_pb2\.py$|
|
||||||
|
python/sglang/srt/grpc/.*_pb2_grpc\.py$|
|
||||||
|
python/sglang/srt/grpc/.*_pb2\.pyi$|
|
||||||
|
python/sglang/srt/grpc/.*_pb2_grpc\.pyi$|
|
||||||
|
)$
|
||||||
|
- repo: https://github.com/psf/black
|
||||||
|
rev: 26.1.0
|
||||||
|
hooks:
|
||||||
|
- id: black-jupyter
|
||||||
|
exclude: '^python/sglang/srt/grpc/.*_pb2\.py$|^python/sglang/srt/grpc/.*_pb2_grpc\.py$|^python/sglang/srt/grpc/.*_pb2\.pyi$|^python/sglang/srt/grpc/.*_pb2_grpc\.pyi$'
|
||||||
|
- repo: https://github.com/codespell-project/codespell
|
||||||
|
rev: v2.4.1
|
||||||
|
hooks:
|
||||||
|
- id: codespell
|
||||||
|
args: ['--config', '.codespellrc']
|
||||||
|
- repo: https://github.com/pre-commit/mirrors-clang-format
|
||||||
|
rev: v20.1.7
|
||||||
|
hooks:
|
||||||
|
- id: clang-format
|
||||||
|
types_or: [c++, cuda]
|
||||||
|
args: [--style=file, --verbose]
|
||||||
|
- repo: https://github.com/kynan/nbstripout
|
||||||
|
rev: 0.9.0
|
||||||
|
hooks:
|
||||||
|
- id: nbstripout
|
||||||
|
args:
|
||||||
|
- '--keep-output'
|
||||||
|
- '--extra-keys=metadata.kernelspec metadata.language_info.version'
|
||||||
|
- repo: local
|
||||||
|
hooks:
|
||||||
|
- id: check-chinese-characters
|
||||||
|
name: check chinese characters in multimodal_gen
|
||||||
|
entry: >-
|
||||||
|
python3 -c 'import sys, re; p=re.compile(r"[\u4e00-\u9fff]"); ec=0; [ ([(print(f"{f}:{i+1}: {l.strip()}") or (ec:=1)) for i,l in enumerate(open(f, "r", encoding="utf-8", errors="ignore")) if p.search(l)]) for f in sys.argv[1:] ]; sys.exit(ec)'
|
||||||
|
language: system
|
||||||
|
files: ^python/sglang/multimodal_gen/.*
|
||||||
|
exclude: ^(python/sglang/multimodal_gen/configs/sample|python/sglang/multimodal_gen/apps/ComfyUI_SGLDiffusion/workflows|python/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages)(/|$)
|
||||||
|
types_or: [python, markdown, json, text]
|
||||||
|
- id: sort-ci-permissions
|
||||||
|
name: sort CI_PERMISSIONS.json
|
||||||
|
entry: python3 .github/update_ci_permission.py --sort-only
|
||||||
|
language: system
|
||||||
|
files: ^\.github/CI_PERMISSIONS\.json$
|
||||||
|
pass_filenames: false
|
||||||
|
- id: check-workflow-job-names
|
||||||
|
name: check for duplicate workflow job names
|
||||||
|
entry: python3 scripts/ci/check_workflow_job_names.py
|
||||||
|
language: system
|
||||||
|
files: ^\.github/workflows/.*\.yml$
|
||||||
|
pass_filenames: false
|
||||||
|
- repo: https://github.com/lycheeverse/lychee.git
|
||||||
|
rev: lychee-v0.22.0
|
||||||
|
hooks:
|
||||||
|
- id: lychee
|
||||||
|
name: check doc links (offline)
|
||||||
|
args: ["--config", ".github/linters/lychee.toml"]
|
||||||
|
stages: [manual]
|
||||||
|
exclude: ^docs/_build/
|
||||||
|
files: |
|
||||||
|
(?x)^(
|
||||||
|
README\.md|
|
||||||
|
docs/.*\.(md|rst|ipynb)
|
||||||
|
)$
|
||||||
425
third_party/sglang/3rdparty/amd/profiling/PROFILING.md
vendored
Normal file
425
third_party/sglang/3rdparty/amd/profiling/PROFILING.md
vendored
Normal file
@@ -0,0 +1,425 @@
|
|||||||
|
## Profiling SGLang Infer System with AMD GPUs
|
||||||
|
This AppNote describes the SGLang profiling technical, code augment and running steps for systems with AMD Instinct GPUs, nevertheless the same procedure may work with Nvidia GPUs too.
|
||||||
|
Examples and steps are provided in detail, to facilitate easy reproduce and use to localize performance problem towards optimizations.
|
||||||
|
Two primary methods are covered:
|
||||||
|
- [RPD](https://github.com/ROCm/rocmProfileData.git)
|
||||||
|
- [PyTorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html)
|
||||||
|
|
||||||
|
### Profiling SGLang Infer System with RPD Profiler
|
||||||
|
RPD profiler is a low-overhead cross-platform profiler. Therefore, the same RPD code augment not only works for profiling on ROCm/AMD GPUs, but also works for profiling on CUDA/Nvidia GPUs as well. To do RPD profiling on SGLang repository, please use scripts and patch files included in this directory and follow the steps below:
|
||||||
|
1. Install RPD with rpd.patch applied during installation using install_rpd.sh, both files are in this directory.
|
||||||
|
|
||||||
|
install_rpd.sh
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# download and install RPD
|
||||||
|
apt update && apt install -y sqlite3 libsqlite3-dev libfmt-dev
|
||||||
|
|
||||||
|
# install rpd module
|
||||||
|
git clone https://github.com/ROCmSoftwarePlatform/rocmProfileData
|
||||||
|
cd rocmProfileData
|
||||||
|
git checkout 976899e9c6dbc6dd2bccf770818e4e44125590ac
|
||||||
|
git apply rpd.patch
|
||||||
|
make && make install
|
||||||
|
cd rocpd_python && python setup.py install && cd ..
|
||||||
|
cd rpd_tracer && make clean;make install && python setup.py install && cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
rpd.patch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
diff --git a/rpd_tracer/Makefile b/rpd_tracer/Makefile
|
||||||
|
index e9d9feb..b2e9e1a 100644
|
||||||
|
--- a/rpd_tracer/Makefile
|
||||||
|
+++ b/rpd_tracer/Makefile
|
||||||
|
@@ -16,7 +16,7 @@ ifneq (,$(HIP_PATH))
|
||||||
|
$(info Building with roctracer)
|
||||||
|
RPD_LIBS += -L/opt/rocm/lib -lroctracer64 -lroctx64 -lamdhip64 -lrocm_smi64
|
||||||
|
RPD_INCLUDES += -I/opt/rocm/include -I/opt/rocm/include/roctracer -I/opt/rocm/include/hsa
|
||||||
|
- RPD_SRCS += RoctracerDataSource.cpp RocmSmiDataSource.cpp
|
||||||
|
+ RPD_SRCS += RoctracerDataSource.cpp
|
||||||
|
RPD_INCLUDES += -D__HIP_PLATFORM_AMD__
|
||||||
|
endif
|
||||||
|
```
|
||||||
|
2. Add loadTracer.sh file included in this directory to /sglang/python/sglang.
|
||||||
|
|
||||||
|
loadTracer.sh
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
################################################################################
|
||||||
|
# Copyright (c) 2021 - 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||||
|
#
|
||||||
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
# of this software and associated documentation files (the "Software"), to deal
|
||||||
|
# in the Software without restriction, including without limitation the rights
|
||||||
|
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
# copies of the Software, and to permit persons to whom the Software is
|
||||||
|
# furnished to do so, subject to the following conditions:
|
||||||
|
#
|
||||||
|
# The above copyright notice and this permission notice shall be included in
|
||||||
|
# all copies or substantial portions of the Software.
|
||||||
|
#
|
||||||
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||||
|
# THE SOFTWARE.
|
||||||
|
################################################################################
|
||||||
|
OUTPUT_FILE="trace.rpd"
|
||||||
|
|
||||||
|
if [ "$1" = "-o" ] ; then
|
||||||
|
OUTPUT_FILE=$2
|
||||||
|
shift
|
||||||
|
shift
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -e ${OUTPUT_FILE} ] ; then
|
||||||
|
rm ${OUTPUT_FILE}
|
||||||
|
fi
|
||||||
|
|
||||||
|
python3 -m rocpd.schema --create ${OUTPUT_FILE}
|
||||||
|
if [ $? != 0 ] ; then
|
||||||
|
echo "Error: Could not create rpd file. Please run 'python setup.py install' from the rocpd_python dir"
|
||||||
|
exit
|
||||||
|
fi
|
||||||
|
|
||||||
|
export RPDT_FILENAME=${OUTPUT_FILE}
|
||||||
|
export RPDT_AUTOSTART=0
|
||||||
|
LD_PRELOAD=librocm-smi_64:librpd_tracer.so "$@"
|
||||||
|
```
|
||||||
|
3. Apply patch (provided in this directory) with "git apply rpd_profile_server_enable.patch" if the main profiling purpose is to get info on gpu kernels as well as limited cpu activity info.
|
||||||
|
|
||||||
|
#### Common Notes 1
|
||||||
|
Please note that although we are doing TP=8 in the example, we purposely only log RPD profiling on 2 ranks in the patch file (i.e.tp_rank=0/1) for profiling/visualization convenience, as even Perfetto streaming mode can only load maximal 8GB json file for visualization. With 2 ranks logged in RPD profiling, we could still check whether there are issues among ranks (e.g. load imbalance issue, nccl issue), and at the same time, we could log relatively longer time duration before the json file generated from RPD file hits 8GB size.
|
||||||
|
|
||||||
|
rpd_profile_server_enable.patch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
diff --git a/python/sglang/srt/managers/scheduler.py b/python/sglang/srt/managers/scheduler.py
|
||||||
|
index 62d1ff9..9021c01 100644
|
||||||
|
--- a/python/sglang/srt/managers/scheduler.py
|
||||||
|
+++ b/python/sglang/srt/managers/scheduler.py
|
||||||
|
@@ -71,6 +71,8 @@ from sglang.srt.utils import (
|
||||||
|
suppress_other_loggers,
|
||||||
|
)
|
||||||
|
from sglang.utils import get_exception_traceback
|
||||||
|
+from rpdTracerControl import rpdTracerControl
|
||||||
|
+rpdTracerControl.skipCreate()
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
@@ -245,6 +247,7 @@ class Scheduler:
|
||||||
|
],
|
||||||
|
with_stack=True,
|
||||||
|
)
|
||||||
|
+ self.rpd = rpdTracerControl()
|
||||||
|
|
||||||
|
@torch.inference_mode()
|
||||||
|
def event_loop(self):
|
||||||
|
@@ -1027,15 +1030,24 @@ class Scheduler:
|
||||||
|
def start_profile(self) -> None:
|
||||||
|
if self.profiler is None:
|
||||||
|
raise RuntimeError("Profiler is not enabled.")
|
||||||
|
- self.profiler.start()
|
||||||
|
+ #self.profiler.start() #block pytorch profiler for rpd profiler enabling
|
||||||
|
+ if self.tp_rank == 0 or self.tp_rank == 1:
|
||||||
|
+ self.rpd.start()
|
||||||
|
+ self.rpd.rangePush("", "rpd profile range", "")
|
||||||
|
+ logger.info("rpd is enabled")
|
||||||
|
|
||||||
|
def stop_profile(self) -> None:
|
||||||
|
if self.profiler is None:
|
||||||
|
raise RuntimeError("Profiler is not enabled.")
|
||||||
|
- self.profiler.stop()
|
||||||
|
- self.profiler.export_chrome_trace(
|
||||||
|
- self.torch_profiler_trace_dir + "/" + str(time.time()) + ".trace.json.gz"
|
||||||
|
- )
|
||||||
|
+ #self.profiler.stop()
|
||||||
|
+ #self.profiler.export_chrome_trace(
|
||||||
|
+ # self.torch_profiler_trace_dir + "/" + str(time.time()) + ".trace.json.gz"
|
||||||
|
+ #)
|
||||||
|
+ if self.tp_rank ==0 or self.tp_rank ==1:
|
||||||
|
+ self.rpd.rangePop()
|
||||||
|
+ self.rpd.stop()
|
||||||
|
+ self.rpd.flush()
|
||||||
|
+ logger.info("rpd is done")
|
||||||
|
logger.info("Profiler is done")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Advanced Debugging with RPD Profiler
|
||||||
|
Sometimes, we want to use rpd profiler to capture more CPU and python activities in order to debug some challenging issues (e.g. root cause of load imbalance across gpu processes, root cause of bubbles, etc). Only in such cases, we need to apply patch "git apply rpd_profile_server_enable_wCPU_activities.patch", where 3 files are modified.
|
||||||
|
|
||||||
|
rpd_profile_server_enable_wCPU_activities.patch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
diff --git a/python/sglang/srt/managers/scheduler.py b/python/sglang/srt/managers/scheduler.py
|
||||||
|
index 62d1ff9..2edb427 100644
|
||||||
|
--- a/python/sglang/srt/managers/scheduler.py
|
||||||
|
+++ b/python/sglang/srt/managers/scheduler.py
|
||||||
|
@@ -71,6 +71,8 @@ from sglang.srt.utils import (
|
||||||
|
suppress_other_loggers,
|
||||||
|
)
|
||||||
|
from sglang.utils import get_exception_traceback
|
||||||
|
+from rpdTracerControl import rpdTracerControl
|
||||||
|
+rpdTracerControl.skipCreate()
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
@@ -245,6 +247,7 @@ class Scheduler:
|
||||||
|
],
|
||||||
|
with_stack=True,
|
||||||
|
)
|
||||||
|
+ self.rpd = rpdTracerControl()
|
||||||
|
|
||||||
|
@torch.inference_mode()
|
||||||
|
def event_loop(self):
|
||||||
|
@@ -1027,15 +1030,26 @@ class Scheduler:
|
||||||
|
def start_profile(self) -> None:
|
||||||
|
if self.profiler is None:
|
||||||
|
raise RuntimeError("Profiler is not enabled.")
|
||||||
|
- self.profiler.start()
|
||||||
|
+ #self.profiler.start()
|
||||||
|
+ logger.info("torch profiler is disabled")
|
||||||
|
+ if self.tp_rank == 0 or self.tp_rank == 1:
|
||||||
|
+ self.rpd.setPythonTrace(True)
|
||||||
|
+ self.rpd.start()
|
||||||
|
+ self.rpd.rangePush("", "scheduler", "")
|
||||||
|
+ logger.info("rpd is enabled inside scheduler profiling")
|
||||||
|
|
||||||
|
def stop_profile(self) -> None:
|
||||||
|
if self.profiler is None:
|
||||||
|
raise RuntimeError("Profiler is not enabled.")
|
||||||
|
- self.profiler.stop()
|
||||||
|
- self.profiler.export_chrome_trace(
|
||||||
|
- self.torch_profiler_trace_dir + "/" + str(time.time()) + ".trace.json.gz"
|
||||||
|
- )
|
||||||
|
+ #self.profiler.stop()
|
||||||
|
+ #self.profiler.export_chrome_trace(
|
||||||
|
+ # self.torch_profiler_trace_dir + "/" + str(time.time()) + ".trace.json.gz"
|
||||||
|
+ #)
|
||||||
|
+ if self.tp_rank ==0 or self.tp_rank ==1:
|
||||||
|
+ self.rpd.rangePop()
|
||||||
|
+ self.rpd.stop()
|
||||||
|
+ self.rpd.flush()
|
||||||
|
+ logger.info("rpd is done inside scheduler")
|
||||||
|
logger.info("Profiler is done")
|
||||||
|
|
||||||
|
|
||||||
|
diff --git a/python/sglang/srt/managers/tokenizer_manager.py b/python/sglang/srt/managers/tokenizer_manager.py
|
||||||
|
index 2621ccd..181df85 100644
|
||||||
|
--- a/python/sglang/srt/managers/tokenizer_manager.py
|
||||||
|
+++ b/python/sglang/srt/managers/tokenizer_manager.py
|
||||||
|
@@ -58,6 +58,10 @@ from sglang.srt.sampling.sampling_params import SamplingParams
|
||||||
|
from sglang.srt.server_args import PortArgs, ServerArgs
|
||||||
|
from sglang.srt.utils import is_generation_model, is_multimodal_model
|
||||||
|
|
||||||
|
+from rpdTracerControl import rpdTracerControl
|
||||||
|
+rpdTracerControl.skipCreate()
|
||||||
|
+
|
||||||
|
+
|
||||||
|
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
@@ -514,10 +518,20 @@ class TokenizerManager:
|
||||||
|
self.send_to_scheduler.send_pyobj(req)
|
||||||
|
|
||||||
|
def start_profile(self):
|
||||||
|
+ rpd = rpdTracerControl()
|
||||||
|
+ rpd.setPythonTrace(True)
|
||||||
|
+ rpd.start()
|
||||||
|
+ rpd.rangePush("", "tokenizer_manager", "")
|
||||||
|
+ logger.info("tokenizer_manager rpd profiling started!")
|
||||||
|
req = ProfileReq.START_PROFILE
|
||||||
|
self.send_to_scheduler.send_pyobj(req)
|
||||||
|
|
||||||
|
def stop_profile(self):
|
||||||
|
+ rpd = rpdTracerControl()
|
||||||
|
+ rpd.rangePop()
|
||||||
|
+ rpd.stop()
|
||||||
|
+ rpd.flush()
|
||||||
|
+ logger.info("rpd profiling is done inside tokenizer_manager!")
|
||||||
|
req = ProfileReq.STOP_PROFILE
|
||||||
|
self.send_to_scheduler.send_pyobj(req)
|
||||||
|
|
||||||
|
diff --git a/python/sglang/srt/server.py b/python/sglang/srt/server.py
|
||||||
|
index 7111c93..2bd722c 100644
|
||||||
|
--- a/python/sglang/srt/server.py
|
||||||
|
+++ b/python/sglang/srt/server.py
|
||||||
|
@@ -30,6 +30,8 @@ import threading
|
||||||
|
import time
|
||||||
|
from http import HTTPStatus
|
||||||
|
from typing import Dict, List, Optional, Union
|
||||||
|
+from rpdTracerControl import rpdTracerControl
|
||||||
|
+rpdTracerControl.skipCreate()
|
||||||
|
|
||||||
|
# Fix a bug of Python threading
|
||||||
|
setattr(threading, "_register_atexit", lambda *args, **kwargs: None)
|
||||||
|
@@ -152,6 +154,11 @@ async def flush_cache():
|
||||||
|
@app.post("/start_profile")
|
||||||
|
async def start_profile():
|
||||||
|
"""Start profiling."""
|
||||||
|
+ rpd = rpdTracerControl()
|
||||||
|
+ rpd.setPythonTrace(True)
|
||||||
|
+ rpd.start()
|
||||||
|
+ rpd.rangePush("", "server rpd profile range", "")
|
||||||
|
+ logger.info("rpd profiling started in server.py!")
|
||||||
|
tokenizer_manager.start_profile()
|
||||||
|
return Response(
|
||||||
|
content="Start profiling.\n",
|
||||||
|
@@ -164,6 +171,11 @@ async def start_profile():
|
||||||
|
async def stop_profile():
|
||||||
|
"""Stop profiling."""
|
||||||
|
tokenizer_manager.stop_profile()
|
||||||
|
+ rpd = rpdTracerControl()
|
||||||
|
+ rpd.rangePop()
|
||||||
|
+ rpd.stop()
|
||||||
|
+ rpd.flush()
|
||||||
|
+ logger.info("rpd profiling is done in server.py!")
|
||||||
|
return Response(
|
||||||
|
content="Stop profiling. This will take some time.\n",
|
||||||
|
status_code=200,
|
||||||
|
```
|
||||||
|
|
||||||
|
4. As an example for grok1 profiling, we create a dummy_grok1 directory with config.json (see content below) inside this directory and copy this directory to the right path for "--model-path" if you want to use the example server.sh file provided.
|
||||||
|
```bash
|
||||||
|
cat ../dummy_grok1/config.json
|
||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Grok1ModelForCausalLM"
|
||||||
|
],
|
||||||
|
"embedding_multiplier_scale": 78.38367176906169,
|
||||||
|
"output_multiplier_scale": 0.5773502691896257,
|
||||||
|
"vocab_size": 131072,
|
||||||
|
"hidden_size": 6144,
|
||||||
|
"intermediate_size": 32768,
|
||||||
|
"max_position_embeddings": 8192,
|
||||||
|
"num_experts_per_tok": 2,
|
||||||
|
"num_local_experts": 8,
|
||||||
|
"num_attention_heads": 48,
|
||||||
|
"num_hidden_layers": 64,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"head_dim": 128,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"model_type": "mixtral",
|
||||||
|
"torch_dtype": "bfloat16"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
5. Launch server with rpd enabled script ./server.sh in one terminal inside the docker container.
|
||||||
|
|
||||||
|
#### Common Notes 2
|
||||||
|
- Remember to change model-path to the correct path
|
||||||
|
- loadTracer.sh is needed to conduct profiling
|
||||||
|
- SGLANG_TORCH_PROFILER_DIR is used for default torch profiler
|
||||||
|
- Do not use loadTracer.sh if you are using the torch profiler, simply use python3 -m sglang.launch_server.
|
||||||
|
|
||||||
|
|
||||||
|
server.sh
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# export SGLANG_TORCH_PROFILER_DIR=/data/sglang/
|
||||||
|
export SGLANG_TORCH_PROFILER_DIR=/sgl-workspace/sglang/profile/
|
||||||
|
|
||||||
|
# Get the current timestamp
|
||||||
|
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||||
|
|
||||||
|
# Define the log file with a timestamp
|
||||||
|
LOGFILE="sglang_server_log_$TIMESTAMP.json"
|
||||||
|
|
||||||
|
# Run the Python command and save the output to the log file
|
||||||
|
loadTracer.sh python3 -m sglang.launch_server \
|
||||||
|
--model-path /sgl-workspace/sglang/dummy_grok1 \
|
||||||
|
--tokenizer-path Xenova/grok-1-tokenizer \
|
||||||
|
--load-format dummy \
|
||||||
|
--quantization fp8 \
|
||||||
|
--tp 8 \
|
||||||
|
--port 30000 \
|
||||||
|
--disable-radix-cache 2>&1 | tee "$LOGFILE"
|
||||||
|
```
|
||||||
|
6. Open another terminal for the same docker container, and run the rpd enabled ./client.sh after you see "The server is fired up and is ready to roll!" message from server side terminal.
|
||||||
|
|
||||||
|
#### Common Notes 3
|
||||||
|
- Use curl http://localhost:30000/start_profile & curl http://localhost:30000/stop_profile to control the start and end of profiling. Check sglang/python/sglang/srt/managers/scheduler.py for more details.
|
||||||
|
- Please don't use RPD profiler together with PyTorch profiler to avoid interference.
|
||||||
|
- The rocmProfileData/tools/rpd2tracing.py file is used to generate json file from RPD file.
|
||||||
|
|
||||||
|
client.sh
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Start profiling via API
|
||||||
|
curl http://localhost:30000/start_profile -H "Content-Type: application/json"
|
||||||
|
|
||||||
|
# Benchmark serving using sglang with random dataset and tokenizer
|
||||||
|
# Define the log file with a timestamp
|
||||||
|
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
|
||||||
|
LOGFILE="sglang_client_log_$TIMESTAMP.json"
|
||||||
|
|
||||||
|
# Run the benchmark with specified parameters and save logs
|
||||||
|
python3 -m sglang.bench_serving \
|
||||||
|
--backend sglang \
|
||||||
|
--tokenizer Xenova/grok-1-tokenizer \
|
||||||
|
--dataset-name random \
|
||||||
|
--random-input 1024\
|
||||||
|
--random-output 1024 \
|
||||||
|
--num-prompts 120 \
|
||||||
|
--request-rate 8 \
|
||||||
|
--output-file online.jsonl 2>&1 | tee "$LOGFILE"
|
||||||
|
|
||||||
|
# Stop profiling via API
|
||||||
|
curl http://localhost:30000/stop_profile -H "Content-Type: application/json"
|
||||||
|
|
||||||
|
# Convert tracing file to csv & json
|
||||||
|
sqlite3 trace.rpd ".mode csv" ".header on" ".output trace.csv" "select * from top;" ".output stdout"
|
||||||
|
python3 ./rocmProfileData/tools/rpd2tracing.py trace.rpd trace.json
|
||||||
|
```
|
||||||
|
7. Follow [Perfetto docs](https://perfetto.dev/docs/visualization/large-traces) to visualize large json files. Try to adjust parameters so that the trace.json file size is less than 9GB.
|
||||||
|
|
||||||
|
### Profiling SGLang Infer System with PyTorch Profiler
|
||||||
|
|
||||||
|
Please use the steps as follows:
|
||||||
|
|
||||||
|
1. Apply the patch torch_profiler.patch. Note that you can modify "if self.tp_rank == 0" in the patch to allow more ranks be recorded in profiling.
|
||||||
|
|
||||||
|
torch_profiler.patch
|
||||||
|
```bash
|
||||||
|
diff --git a/python/sglang/srt/managers/scheduler.py b/python/sglang/srt/managers/scheduler.py
|
||||||
|
index 62d1ff9..6ecd78c 100644
|
||||||
|
--- a/python/sglang/srt/managers/scheduler.py
|
||||||
|
+++ b/python/sglang/srt/managers/scheduler.py
|
||||||
|
@@ -240,7 +240,6 @@ class Scheduler:
|
||||||
|
)
|
||||||
|
self.profiler = torch.profiler.profile(
|
||||||
|
activities=[
|
||||||
|
- torch.profiler.ProfilerActivity.CPU,
|
||||||
|
torch.profiler.ProfilerActivity.CUDA,
|
||||||
|
],
|
||||||
|
with_stack=True,
|
||||||
|
@@ -1033,9 +1032,11 @@ class Scheduler:
|
||||||
|
if self.profiler is None:
|
||||||
|
raise RuntimeError("Profiler is not enabled.")
|
||||||
|
self.profiler.stop()
|
||||||
|
- self.profiler.export_chrome_trace(
|
||||||
|
- self.torch_profiler_trace_dir + "/" + str(time.time()) + ".trace.json.gz"
|
||||||
|
- )
|
||||||
|
+ if self.tp_rank == 0:
|
||||||
|
+ with open(f"stats_repro_{int(time.time())}.txt", "w") as f:
|
||||||
|
+ print(self.profiler.key_averages(group_by_input_shape=True).table(sort_by="cuda_time_total", row_limit=-1), file=f)
|
||||||
|
+ print("Profiling stats done.")
|
||||||
|
+
|
||||||
|
logger.info("Profiler is done")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create the model path directory and copy it to the right path for "--model-path" if you want to use the server.sh file provided.
|
||||||
|
|
||||||
|
3. Modify the included server.sh by removing "loadTracer.sh" before python command and launch script ./server.sh in one terminal inside the docker container.
|
||||||
|
|
||||||
|
4. Similar to step 6 in RPD profiling section, but remove the last 2 lines in client.sh, which converted rpd file into csv and json files. Run modified client.sh for PyTorch profiling.
|
||||||
|
-------
|
||||||
|
- [Torch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html)
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user