xtrain

Files

Gahow Wang d422c68704 docs: KI-5 — correct cross-rank divergence attribution (pre-existing flaky)

The ~1-ULP cross-rank param divergence is NOT caused by coalescing: the
original ungrouped all-reduce is itself run-to-run nondeterministic on
this box (6 reruns: cross-rank diff {0, 0, 5.96e-8, 5.96e-8, 1.19e-7,
1.19e-7}), so the T8 test's `max|p0-p1| == 0.0` assertion is flaky here
(passes ~1/3 of runs) independent of T11. Diffs are ≤1.19e-7 (a few ULP,
numerically benign; loss-match stays ~6e-7). Noted as a follow-up to
loosen the assertion to a tight tolerance; coalescing was reverted purely
because it gives ~0 scaling benefit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-16 09:42:13 +08:00

runs

docs: run v3 — TinyStories, dim512, val 1.30

2026-06-16 03:37:45 +08:00

00-build-chain.md

docs: backfill T1 build-chain

2026-06-15 15:12:55 +08:00

01-tensor.md

docs: Phase T2 — tensor abstraction

2026-06-15 15:12:55 +08:00

02-gemm-autodiff.md

docs: Phase T3 — GEMM fwd/bwd + finite-diff