Commit Graph

2 Commits

Author SHA1 Message Date
531cd3fe08 style: format Rust workspace 2026-06-18 18:11:58 +08:00
453520d622 distributed: NCCL tensor-parallel primitives (TpContext + AllReduce)
New xserv-distributed crate: hand-written NCCL FFI, TpContext (one rank per
thread, bound to one GPU), and in-place BF16 AllReduce on the null stream so
it orders naturally with the model's kernels. 2-GPU AllReduce test included.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:10:14 +08:00