xserv

Files

Gahow Wang 4c3f914459 kernels/cuda: paged-attention kernel, dispatch, pinned host memory

CUDA layer for the paged-KV + swap work:
- csrc: new paged_attention.cu plus updates across attention/gemm/norm/
  activation/embedding/reduce kernels and common.cuh.
- xserv-kernels: new dispatch module and kernel-binding updates.
- xserv-cuda: cudaMallocHost/FreeHost bindings + PinnedBuffer (host swap
  pool backing) and offset-aware D2H/H2D copies used to move KV blocks
  between the GPU pool and pinned host memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 19:58:36 +08:00

embedding.cu

kernels/cuda: paged-attention kernel, dispatch, pinned host memory

2026-05-28 19:58:36 +08:00

rope.cu

kernels/cuda: paged-attention kernel, dispatch, pinned host memory

2026-05-28 19:58:36 +08:00

transpose.cu

kernels/cuda: paged-attention kernel, dispatch, pinned host memory

2026-05-28 19:58:36 +08:00