xserv

Go to file

Gahow Wang 6035ffdc0b phase 5: naive multi-head attention

- Batched GEMM via cublasGemmStridedBatchedEx
- Causal mask CUDA kernel (F32 + BF16)
- Element-wise scale CUDA kernel (F32 + BF16)
- attention() composing: batched_matmul + scale + causal_mask + softmax
- Fixed to_device/contiguous infinite recursion (GPU contiguous via CPU round-trip)
- 5 attention tests passing (max_err < 3e-7 F32)
- Total: 61 tests passing across all crates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-21 21:17:23 +08:00

crates

phase 5: naive multi-head attention

2026-05-21 21:17:23 +08:00

csrc

phase 5: naive multi-head attention

2026-05-21 21:17:23 +08:00

docs

phase 5: naive multi-head attention

2026-05-21 21:17:23 +08:00

tools

phase 0+1: project scaffold + xserv-cuda crate