xtrain

Files

Gahow Wang cb64604496 post-train: M1 fix — enlarge arith key space + saturation guard

The default operand ranges (max_add=99, max_mul=12) gave only ~20k unique
problems, so 'gen_arith_task --n 20000 --eval 500' (a) made train dedup
pathologically slow near saturation and (b) made the disjoint-eval loop never
terminate. A background run stalled after ~10k train rows with no eval files.

Fix (root cause, not a workaround):
- enlarge default ranges to max_add=999, max_mul=99 (~2.01M key space) so 20k+
  requests are a tiny fraction and dedup stays trivial;
- add unique_space() + a generator guard that errors clearly when n+eval exceeds
  80% of the key space, instead of looping forever.

Verified: cargo test 10/10; full 20000/500 gen now 0.2s, all 3 files, 0
train/eval leakage; guard panics on an oversized (--max-add 99) request.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-29 23:28:25 +08:00

xtrain-autodiff

sft: assistant-only SFT (ignore-index CE) + chat-prompt greedy eval

2026-06-29 16:19:02 +08:00

xtrain-cuda

gqa: real grouped-query attention (repeat_kv op + both SDPA paths + wiring + tests)

2026-06-18 01:37:37 +08:00

xtrain-distributed

sft: assistant-only SFT (ignore-index CE) + chat-prompt greedy eval

2026-06-29 16:19:02 +08:00

xtrain-model

sft: assistant-only SFT (ignore-index CE) + chat-prompt greedy eval