xtrain

Files

Gahow Wang e44e50ef78 data: full TinyStories + tokenized-id cache, val loss, CLI arch

- Corpus::load_cached: tokenize the (large) corpus ONCE, cache the id stream to
  <corpus>.u16.bin (gpt2 vocab 50257 < 65536 → exact u16), read cache on reruns.
- Corpus::split_tail: hold out a tail slice as a validation corpus.
- train(): take an optional valid corpus + eval_every/eval_batches; periodic
  deterministic val-loss eval that checkpoints the BEST val model; returns
  TrainResult{train_losses, evals, best_val}. T6 fixed-cadence path preserved.
- bin/train + bin/export_safetensors: read architecture (--heads/--head-dim/
  --layers/--ffn) + opt knobs (--steps/--batch/--seq/--max-lr/--val-tokens/
  --eval-every) from CLI flags; defaults reproduce the v0-baseline tiny config.
- gitignore the multi-GB corpus + *.u16.bin caches + *.ckpt (dash5-only).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-15 18:34:48 +08:00

xtrain-autodiff

ops: grad-check the T5 structural ops

2026-06-15 16:05:20 +08:00

xtrain-cuda

perf: streams / drop per-op sync

2026-06-15 16:56:17 +08:00

xtrain-distributed

dist: lengthen scaling bench so NCCL init amortizes

2026-06-15 17:18:23 +08:00

xtrain-model

train: parameterize model size (scaling ladder)