xtrain/Cargo.toml at f22429f5b89a0df69a627e1c0e3928196ff682de - xtrain - Local Gitea

gahow/xtrain

Files

Gahow Wang f22429f5b8 optim: hand-written AdamW (decoupled weight decay + bias correction)

New xtrain-optim crate. AdamW with per-param m/v moments keyed by params()
index, global bias correction, and decoupled weight decay (matches
torch.optim.AdamW). Split into a pure-host step_host (flat f32 buffers,
unit-testable on a GPU-less host) and a step(&[Var]) wrapper that round-trips
each param value/grad through the GPU tensor (gated not(no_cuda)). Per-step lr
argument leaves room for an LR schedule.

Host unit test checks the update against an independent reference recurrence
over 20 steps and the pure-decay (g=0) boundary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-15 16:28:23 +08:00

19 lines

303 B

TOML

Raw Blame History

 [workspace]
 resolver = "2"
 members = [
     "crates/xtrain-cuda",
     "crates/xtrain-tensor",
     "crates/xtrain-autodiff",
     "crates/xtrain-model",
     "crates/xtrain-optim",
 ]
 [workspace.package]
 version = "0.1.0"
 edition = "2024"
 license = "MIT"
 [workspace.dependencies]
 half = "2"
 smallvec = "1"