Gahow Wang 6035ffdc0b phase 5: naive multi-head attention
- Batched GEMM via cublasGemmStridedBatchedEx
- Causal mask CUDA kernel (F32 + BF16)
- Element-wise scale CUDA kernel (F32 + BF16)
- attention() composing: batched_matmul + scale + causal_mask + softmax
- Fixed to_device/contiguous infinite recursion (GPU contiguous via CPU round-trip)
- 5 attention tests passing (max_err < 3e-7 F32)
- Total: 61 tests passing across all crates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 21:17:23 +08:00
2026-05-21 21:17:23 +08:00
2026-05-21 21:17:23 +08:00
Description
No description provided
2 MiB
Languages
Rust 67.5%
Python 15.1%
Cuda 13.5%
Shell 3.9%