optim: hand-written AdamW (decoupled weight decay + bias correction)

New xtrain-optim crate. AdamW with per-param m/v moments keyed by params() index, global bias correction, and decoupled weight decay (matches torch.optim.AdamW). Split into a pure-host step_host (flat f32 buffers, unit-testable on a GPU-less host) and a step(&[Var]) wrapper that round-trips each param value/grad through the GPU tensor (gated not(no_cuda)). Per-step lr argument leaves room for an LR schedule. Host unit test checks the update against an independent reference recurrence over 20 steps and the pure-decay (g=0) boundary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 16:28:23 +08:00
parent 8565565647
commit f22429f5b8
6 changed files with 301 additions and 0 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -112,6 +112,15 @@ dependencies = [
 "xtrain-tensor",
 ]

+[[package]]
+name = "xtrain-optim"
+version = "0.1.0"
+dependencies = [
+ "xtrain-autodiff",
+ "xtrain-cuda",
+ "xtrain-tensor",
+]
+
 [[package]]
 name = "xtrain-tensor"
 version = "0.1.0"