model: tiny RoPE+RMSNorm+SwiGLU transformer + overfit test

New crate xtrain-model: a from-scratch decoder built entirely from the
autodiff op set.
- Config (tiny: dim=32, 2 layers, 2 heads, head_dim=16, ffn=64).
- TinyTransformer: embedding -> N x {pre-RMSNorm -> multi-head causal
  attention (RoPE, additive causal mask, per-head SDPA) -> residual;
  pre-RMSNorm -> SwiGLU MLP -> residual} -> final RMSNorm -> LM head.
  x@W weight convention (engine GEMM is plain A@B); dim=n_heads*head_dim.
- params()/zero_grad-able leaves for the optimizer; param_to_host export.
- overfit test: char-level bring-up (embedded text -> vocab -> shifted
  targets), minimal hand-written GD (p -= lr*grad) memorises one fixed
  batch -> loss ~0 + greedy argmax matches targets. End-to-end fwd+bwd
  correctness signal. Gated #![cfg(not(no_cuda))].

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This commit is contained in:

Gahow Wang

2026-06-15 16:05:20 +08:00

parent 0acfa5df11

commit e3912c2380

8 changed files with 466 additions and 0 deletions

9

Cargo.lock generated

View File

@@ -103,6 +103,15 @@ dependencies = [
  "cc",
 ]
 [[package]]
 name = "xtrain-model"
 version = "0.1.0"
 dependencies = [
  "xtrain-autodiff",
  "xtrain-cuda",
  "xtrain-tensor",
 ]
 [[package]]
 name = "xtrain-tensor"
 version = "0.1.0"

model: tiny RoPE+RMSNorm+SwiGLU transformer + overfit test

9 Cargo.lock generated Unescape Escape View File

9

Cargo.lock generated

View File