xserv

Go to file

Gahow Wang 4c3f914459 kernels/cuda: paged-attention kernel, dispatch, pinned host memory

CUDA layer for the paged-KV + swap work:
- csrc: new paged_attention.cu plus updates across attention/gemm/norm/
  activation/embedding/reduce kernels and common.cuh.
- xserv-kernels: new dispatch module and kernel-binding updates.
- xserv-cuda: cudaMallocHost/FreeHost bindings + PinnedBuffer (host swap
  pool backing) and offset-aware D2H/H2D copies used to move KV blocks
  between the GPU pool and pinned host memory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 19:58:36 +08:00

crates

kernels/cuda: paged-attention kernel, dispatch, pinned host memory

2026-05-28 19:58:36 +08:00

csrc

kernels/cuda: paged-attention kernel, dispatch, pinned host memory

2026-05-28 19:58:36 +08:00

docs

docs: llama.cpp vs xserv benchmark results + summary

2026-05-28 15:06:21 +08:00

third_party

tools: add llama.cpp comparison baseline + standard benchmark suite