xtrain

154 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
Gahow Wang	63dc05fd10	tensor: add scale elementwise CUDA kernel + FFI New csrc/ops/elementwise.cu (out[i]=in[i]*alpha), compiled by xtrain-cuda/build.rs and exposed via launch_scale_f32 FFI, gated behind not(no_cuda) like the existing vecadd smoke test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:13:06 +08:00
Gahow Wang	8557a289a2	docs: Phase T2 — tensor abstraction Design doc for the minimal tensor layer: DType/shape/Storage/Tensor, host↔device copy, and one elementwise kernel (scale) wired end-to-end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:12:55 +08:00
Gahow Wang	c1b204296b	docs: backfill T1 build-chain T1 shipped without a design doc; capture the Rust↔CUDA build chain (build.rs+nvcc, no_cuda cfg pattern, RAII GpuBuffer, gitea↔dash5 flow). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:12:55 +08:00
Gahow Wang	92acf9f413	T1: scaffold repo + Rust/CUDA build chain (vecadd smoke test) Stand up the xtrain project skeleton: a Cargo workspace mirroring xserv's csrc/ + crates/ layout, with a single xtrain-cuda crate that wraps the CUDA Runtime over hand-written extern "C" FFI. build.rs compiles csrc/test/vecadd.cu via the cc crate targeting sm_120 (RTX 5090) and links cudart. A gated integration test runs the vector-add kernel on the GPU and asserts the result. When nvcc is absent (local GPU-less machine), build.rs skips CUDA compilation and sets a `no_cuda` cfg so host-side cargo check still works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 14:42:43 +08:00

154 Commits