Logo
Explore Help
Sign In
gahow/xtrain
1
0
Fork 0
You've already forked xtrain
Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity
Files
9e958cb0f974b16b8adaee1d8273562b7abd4961
xtrain/csrc/ops
History
Gahow Wang d217f4fbd3 perf: spread flash bwd dK/dV atomics across all threads
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 23:27:33 +08:00
..
attention.cu
autograd: batch dim for ops (flatten linears, batched attention)
2026-06-16 00:44:15 +08:00
cast.cu
cuda: bf16 cuBLAS GemmEx (16BF in/out, fp32 accum) + cast kernels
2026-06-16 14:14:39 +08:00
elementwise.cu
tensor: add scale elementwise CUDA kernel + FFI
2026-06-15 15:13:06 +08:00
flash_attention.cu
perf: spread flash bwd dK/dV atomics across all threads
2026-06-17 23:27:33 +08:00
gemm.cu
gemm: tiled F32 forward + transpose + backward (dA/dB)
2026-06-15 15:26:51 +08:00
model.cu
autograd: batch dim for ops (flatten linears, batched attention)
2026-06-16 00:44:15 +08:00
nn.cu
autograd: batch dim for ops (flatten linears, batched attention)
2026-06-16 00:44:15 +08:00
optim.cu
perf: GPU AdamW + grad-norm
2026-06-15 16:53:09 +08:00
Powered by Gitea Version: 1.24.7 Page: 31ms Template: 0ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API