New bin export_safetensors: load an xtrain checkpoint, map every param to its
HF Qwen3 tensor name, transpose 2D projection weights [in,out]->[out,in]
(1D norms + [vocab,dim] embed/lm_head kept), cast to BF16 (xserv's qwen3
forward is BF16-only), and write config.json + model.safetensors + a copy of
the gpt2 tokenizer.json. Sized exactly like bin/train.rs. safetensors 0.5 to
match xserv. GPU body gated behind not(no_cuda).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Training loop (train_loop.rs): sample batch_size sequences, forward loss +
backward (tape SUMs grads), clip_grad_norm with ×1/batch averaging, AdamW step
with scheduled lr, zero_grad; logs loss/lr/gnorm/tok-s and checkpoints
periodically; returns the loss trace.
Checkpoint (checkpoint.rs): flat little-endian dump of params() in order
(magic/version/count + per-param ndim/dims/f32 data); load_into validates and
overwrites a matching model's params via set_value (exact f32 round-trip).
Sampler (sample.rs): autoregressive greedy / temperature generation — re-runs
forward on the growing prefix (model is single-sequence, RoPE pos=row).
bin/train.rs: end-to-end entry — load tokenizer+corpus, train a tiny 4-layer
model for a bounded budget, checkpoint, print samples. no_cuda stub keeps it
buildable on a GPU-less host.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New xtrain-train crate scaffold. Data pipeline reuses xserv's from-scratch
GPT-2/Qwen BPE via a path-dep (../../../xserv/crates/xserv-tokenizer, resolves
on both ~/projects and dash5 /opt/wjh/projects): Corpus::load tokenizes the
corpus into one id stream and samples fixed-length (input, target) next-token
windows (LCG-seeded, reproducible). Trims a range-downloaded file to whole
stories (<|endoftext|> boundaries).
Also the host-only training math: LrSchedule (linear warmup + cosine decay)
and global L2 grad-norm + clip scale, each with a local unit test.
Corpus: data/tinystories-valid-3mb.txt — first ~3MB of TinyStories-valid
(fetched on dash5 via hf-mirror.com; HF direct unreachable). Substitution
noted: a real TinyStories subset, not the full set.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>