Files
xtrain/crates/xtrain-distributed
Gahow Wang abe5ceb913 test: grad-accum equivalence + accum=1 bit-identity + DDP+accum
- grad_accum.rs: accum=N×B grads bit-close to a single N·B big batch;
  accum_steps=1 bit-identical (max|Δ|==0) to no-accum; real train() loop
  with accum tracks a big-batch baseline over 20 AdamW steps.
- ddp_correctness.rs: world=2 + accum=2 matches a single-GPU big batch of
  the same effective size (loss + cross-rank + vs-baseline).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 23:45:40 +08:00
..
2026-06-15 17:14:56 +08:00