This website requires JavaScript.
Explore
Help
Sign In
gahow
/
xtrain
Watch
1
Star
0
Fork
0
You've already forked xtrain
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
9e958cb0f974b16b8adaee1d8273562b7abd4961
xtrain
/
csrc
/
ops
History
Gahow Wang
d217f4fbd3
perf: spread flash bwd dK/dV atomics across all threads
...
Co-Authored-By: Claude Opus 4.8 <
noreply@anthropic.com
>
2026-06-17 23:27:33 +08:00
..
attention.cu
autograd: batch dim for ops (flatten linears, batched attention)
2026-06-16 00:44:15 +08:00
cast.cu
cuda: bf16 cuBLAS GemmEx (16BF in/out, fp32 accum) + cast kernels
2026-06-16 14:14:39 +08:00
elementwise.cu
tensor: add scale elementwise CUDA kernel + FFI
2026-06-15 15:13:06 +08:00
flash_attention.cu
perf: spread flash bwd dK/dV atomics across all threads
2026-06-17 23:27:33 +08:00
gemm.cu
gemm: tiled F32 forward + transpose + backward (dA/dB)
2026-06-15 15:26:51 +08:00
model.cu
autograd: batch dim for ops (flatten linears, batched attention)
2026-06-16 00:44:15 +08:00
nn.cu
autograd: batch dim for ops (flatten linears, batched attention)
2026-06-16 00:44:15 +08:00
optim.cu
perf: GPU AdamW + grad-norm
2026-06-15 16:53:09 +08:00