xserv

Files

Gahow Wang 5343391dbd review cleanups: pp+gpt-oss guard, sparse GEMV asserts, warnings

- --pp with gpt-oss now fails with a clear message instead of a
  cryptic missing-weight panic inside the Qwen3-only PP engine.
- Sparse GEMV wrappers assert K%16==0 (FP8) / K%32==0 (MXFP4) — the
  uint4-vectorized kernels would silently drop a tail otherwise.
- Document the topk_ids buffer holding i32 under an F32 dtype label
  (DType has no I32).
- Drop unused imports/locals and the cuBLASLt scale-mode constants
  orphaned by the strided-batched FP8 rework (e631a71).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-06-12 17:02:59 +08:00

activation

gpt-oss: drop debug syncs from forward; GPU broadcast bias-add

2026-06-12 17:02:59 +08:00

attention

kernels: fix NaN in flash-attention sinks on fully-masked window tiles

2026-06-02 16:09:43 +08:00

embedding

kernels/cuda: paged-attention kernel, dispatch, pinned host memory

2026-05-28 19:58:36 +08:00

gemm

kernels: fix uninitialized shared-memory read in M=1 decode GEMV