Files
xtrain/crates/xtrain-distributed
Gahow Wang 4abb17383a test: process-per-GPU DDP correctness (ddp_proc.rs)
Self-launching test: worker mode (XTRAIN_RANK set) trains on synthetic corpus
and dumps loss+params; launcher mode runs single-GPU baseline + thread-per-GPU
launch + spawns 2 worker processes, then asserts (a) proc loss == single-GPU
<1e-3, (b) cross-rank params <1e-6 (KI-5 ULP), (c) proc loss == thread-per-GPU
<1e-3. Run with --test-threads=1 (distributed harness property).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 17:48:52 +08:00
..
2026-06-15 17:14:56 +08:00