xtrain

Files

Gahow Wang 2f8118fda9 test: tighten AdamW parity (f32 reference, 10 steps, allclose tol)

The loss trajectory already matched torch.optim.AdamW (worst relerr ~2e-4),
but the float64 torch reference diverged per-weight from the f32 GPU training
after the model memorised the batch (flat region: weights underdetermined,
loss identical). Fixes: run the torch reference in float32 (match engine
precision), shorten to 10 steps (weights still well-determined), and compare
final params with an allclose-style rtol+atol metric (a pure relative metric is
misleading on near-zero weights).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-15 16:34:18 +08:00

xtrain-autodiff

ops: grad-check the T5 structural ops

2026-06-15 16:05:20 +08:00

xtrain-cuda

ops: embedding/reshape/transpose/split-merge-heads fwd+bwd

2026-06-15 16:05:09 +08:00

xtrain-model

model: silence torch parity warning (read loss before backward)

2026-06-15 16:09:30 +08:00

xtrain-optim

optim: hand-written AdamW (decoupled weight decay + bias correction)

2026-06-15 16:28:23 +08:00

xtrain-tensor

ops: embedding/reshape/transpose/split-merge-heads fwd+bwd

2026-06-15 16:05:09 +08:00

xtrain-train

test: tighten AdamW parity (f32 reference, 10 steps, allclose tol)

2026-06-15 16:34:18 +08:00