xtrain

Files

Gahow Wang 1574e21d89 post-train: M1 — verifiable-arith eval scorer + SFT format-baseline result

eval_arith: load ckpt, greedy-generate per held-out prompt, parse \boxed{}
via the shared task checker, report format(boxed) + correctness pass-rates.
Reused as the verifiable-eval harness for M3 (DPO) / M4 (GRPO).

M1 result (100 held-out prompts, v12 1.05B base): SFT moves answer-format
adherence 0% -> 100%, arithmetic correctness 8% -- the intended split (SFT
buys the format; correctness is the verifiable-reward job of M3/M4). Logged
in docs/18 implementation log + a Phase-3 row in docs/evolution.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-30 11:13:19 +08:00

xtrain-autodiff

sft: assistant-only SFT (ignore-index CE) + chat-prompt greedy eval

2026-06-29 16:19:02 +08:00

xtrain-cuda

gqa: real grouped-query attention (repeat_kv op + both SDPA paths + wiring + tests)

2026-06-18 01:37:37 +08:00

xtrain-distributed

sft: assistant-only SFT (ignore-index CE) + chat-prompt greedy eval

2026-06-29 16:19:02 +08:00

xtrain-model

sft: assistant-only SFT (ignore-index CE) + chat-prompt greedy eval