xserv/Cargo.toml at 6309dc1181d059e4d46eaef0db1c71587a2526c4 - xserv - Local Gitea

gahow/xserv

Files

Gahow Wang 453520d622 distributed: NCCL tensor-parallel primitives (TpContext + AllReduce)

New xserv-distributed crate: hand-written NCCL FFI, TpContext (one rank per
thread, bound to one GPU), and in-place BF16 AllReduce on the null stream so
it orders naturally with the model's kernels. 2-GPU AllReduce test included.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-29 11:10:14 +08:00

9 lines

165 B

TOML

Raw Blame History

 [package]
 name = "xserv-distributed"
 version.workspace = true
 edition.workspace = true
 [dependencies]
 xserv-cuda = { path = "../xserv-cuda" }
 half.workspace = true