Files
xserv/crates/xserv-distributed/Cargo.toml
Gahow Wang 453520d622 distributed: NCCL tensor-parallel primitives (TpContext + AllReduce)
New xserv-distributed crate: hand-written NCCL FFI, TpContext (one rank per
thread, bound to one GPU), and in-place BF16 AllReduce on the null stream so
it orders naturally with the model's kernels. 2-GPU AllReduce test included.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:10:14 +08:00

9 lines
165 B
TOML

[package]
name = "xserv-distributed"
version.workspace = true
edition.workspace = true
[dependencies]
xserv-cuda = { path = "../xserv-cuda" }
half.workspace = true