Enable assistant-only supervised fine-tuning and a fixed chat-prompt eval path
used by the v12 SFT runs:
- cross_entropy ignores negative targets (-100 ignore-index), normalizing by
valid rows instead of all rows; CUDA fwd/bwd skip t<0 (ops.rs, nn.cu).
- Corpus gains optional labels + load_sft_tsv_cached: two-column TSV is
formatted as 'User: .. \nAssistant:' + answer + <|endoftext|>, prompt tokens
masked to -100 while answer+EOS are supervised; i32 label cache alongside the
u16 token cache; sample() retries windows that are fully masked; eval uses
target_window so masking applies to val loss too (data.rs, train_loop.rs).
- train + train_ddp: --sft-tsv selects the TSV loader, --init-ckpt continues
training from a base checkpoint.
- greedy_sample: --prompts-file/--prompt/--temperature for fixed chat-prompt
generation eval.
Test fixtures updated for the new Corpus.labels field; dropout.rs carries
incidental rustfmt. Not rebuilt locally (no CUDA toolchain on this checkout);
correctness rests on the documented v12 base+SFT runs on the GPU box.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>