Files
xtrain/docs
Gahow Wang d422c68704 docs: KI-5 — correct cross-rank divergence attribution (pre-existing flaky)
The ~1-ULP cross-rank param divergence is NOT caused by coalescing: the
original ungrouped all-reduce is itself run-to-run nondeterministic on
this box (6 reruns: cross-rank diff {0, 0, 5.96e-8, 5.96e-8, 1.19e-7,
1.19e-7}), so the T8 test's `max|p0-p1| == 0.0` assertion is flaky here
(passes ~1/3 of runs) independent of T11. Diffs are ≤1.19e-7 (a few ULP,
numerically benign; loss-match stays ~6e-7). Noted as a follow-up to
loosen the assertion to a tight tolerance; coalescing was reverted purely
because it gives ~0 scaling benefit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 09:42:13 +08:00
..