Files
xtrain/crates
Gahow Wang 5aef3742d6 ops: transformer op fwd/bwd CUDA kernels + Tensor wrappers
add/mul/add_bias(+sum_rows)/rms_norm/silu/rope/softmax/cross_entropy,
each with its analytic backward, in csrc/ops/nn.cu (inlined warp/block
reductions). FFI declarations + nn.cu in build.rs (no_cuda gated). Tensor
gains the matching thin wrappers; DType grows I32 for cross-entropy targets.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:44:09 +08:00
..