Files
xtrain/crates
Gahow Wang 2f8118fda9 test: tighten AdamW parity (f32 reference, 10 steps, allclose tol)
The loss trajectory already matched torch.optim.AdamW (worst relerr ~2e-4),
but the float64 torch reference diverged per-weight from the f32 GPU training
after the model memorised the batch (flat region: weights underdetermined,
loss identical). Fixes: run the torch reference in float32 (match engine
precision), shorten to 10 steps (weights still well-determined), and compare
final params with an allclose-style rtol+atol metric (a pure relative metric is
misleading on near-zero weights).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 16:34:18 +08:00
..