xtrain/Cargo.toml at 9e958cb0f974b16b8adaee1d8273562b7abd4961 - xtrain - Local Gitea

gahow/xtrain

Files

Gahow Wang e27df50ca9 dist: nccl ffi + comm bootstrap

New crate xtrain-distributed (mirrors xserv-distributed): hand-written NCCL
FFI (GetUniqueId / CommInitRank / AllReduce / CommDestroy / Group{Start,End},
ncclUniqueId passed by value per the NCCL ABI) and a safe DdpContext wrapper —
rank 0 mints the UniqueId, every rank inits its communicator under a group, and
all_reduce_average_grads in-place AllReduce(sum)s each param's .grad() device
buffer then scales by 1/world (reuses T7's scale_inplace kernel). AllReduce runs
on the null stream so it orders with the model's kernels (no extra barrier).

build.rs follows the per-crate convention: no nvcc -> no_cuda cfg (crate
compiles to empty, cargo check passes host-side); with nvcc, links -lnccl
-lcudart like xserv-distributed's build.rs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-15 17:14:56 +08:00

21 lines

363 B

TOML

Raw Blame History

 [workspace]
 resolver = "2"
 members = [
     "crates/xtrain-cuda",
     "crates/xtrain-tensor",
     "crates/xtrain-autodiff",
     "crates/xtrain-model",
     "crates/xtrain-optim",
     "crates/xtrain-train",
     "crates/xtrain-distributed",
 ]
 [workspace.package]
 version = "0.1.0"
 edition = "2024"
 license = "MIT"
 [workspace.dependencies]
 half = "2"
 smallvec = "1"