review cleanups: pp+gpt-oss guard, sparse GEMV asserts, warnings
- --pp with gpt-oss now fails with a clear message instead of a
cryptic missing-weight panic inside the Qwen3-only PP engine.
- Sparse GEMV wrappers assert K%16==0 (FP8) / K%32==0 (MXFP4) — the
uint4-vectorized kernels would silently drop a tail otherwise.
- Document the topk_ids buffer holding i32 under an F32 dtype label
(DType has no I32).
- Drop unused imports/locals and the cuBLASLt scale-mode constants
orphaned by the strided-batched FP8 rework (e631a71).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -93,11 +93,9 @@ __global__ void moe_replicate_bf16_kernel(
|
||||
int total = local_experts * num_tokens * hidden;
|
||||
if (idx >= total) return;
|
||||
|
||||
int expert = idx / (num_tokens * hidden);
|
||||
int remainder = idx % (num_tokens * hidden);
|
||||
// x_rep[expert, token, dim] = x[token, dim]
|
||||
x_rep[idx] = x[remainder];
|
||||
(void)expert; // suppress unused warning
|
||||
}
|
||||
|
||||
// ============================================================
|
||||
|
||||
Reference in New Issue
Block a user