docs: T9 verification results (xserv == xtrain, dash5)

Capture the closed-loop run: train (loss 10.84->3.59) -> export (47 tensors,
BF16) -> xserv dump-logits + greedy. Top-1 + top-11 token order identical,
logits within ~1e-2 (BF16-vs-f32 drift), greedy generation token-for-token
identical across two prompts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 17:37:46 +08:00
parent e246c3bec2
commit 8981cf7982

View File

@@ -133,6 +133,44 @@ crates/xtrain-train/tests/adamw_parity{.py,_dump.rs}# 同上
```
判据:**贪心 token 序列一致**BF16 推理 vs f32 训练logits top-1 吻合;分布在 BF16 容差内)。
## 验证结果
## 验证结果dash5 实跑capture
> 待 dash5 实跑回填push origin main → dash5 pull → 跑 ②③④ → capture
**训练**CUDA_VISIBLE_DEVICES=01200 步gpt2 vocabdim 32 / 4 层 / 2 头 / ffn 64~5M 参8700 tok/s
`loss 10.84 → 3.59`贪心采样输出连贯英文QK-norm 加入后训练/采样无回归)。
**导出**`export: 47 tensors`embed + 4×11 + final_norm + lm_head写出 config.json见上+
model.safetensorsBF166.5 MB+ tokenizer.json。
**① logits 数值对拍**(同 prompt `"Once upon a time"`token ids `[7454, 2402, 257, 640]`
| rank | xtrain (f32) | xserv (BF16) | token |
|------|--------------|--------------|-------|
| 0 | 11.7711 | 11.7500 | `,` |
| 1 | 10.4724 | 10.5000 | ` there` |
| 2 | 6.6288 | 6.6562 | ` upon` |
| 3 | 6.5125 | 6.5000 | ` to` |
| … | … | … | … |
| 10 | 5.3614 | 5.3438 | ` she` |
top-1 一致(`,`id 11top-11 token 排序完全一致logit 绝对差 ~1e-2~0.20.9% 相对),
正是 **BF16 推理 vs f32 训练** 的预期舍入漂移,无结构性误差。
**② 贪心生成逐 token 一致**xserv `xserv-cli --max-tokens 40` vs xtrain `sample.rs generate temp=0`
```
prompt "Once upon a time":
xtrain: Once upon a time, there was a little girl named Lily. Timmy loved to play
with her mommy. One day, Timmy's mommy's mommy's mommy. "I'm sorry, I
xserv: Once upon a time, there was a little girl named Lily. Timmy loved to play
with her mommy. One day, Timmy's mommy's mommy's mommy. "I'm sorry, I ← 逐 token 相同
prompt "One day":
两侧均: One day, Timmy's mommy's mommy's mommy. "I'm sorry, I can't be careful and
be careful. I'm sorry, I can't have a good time. ← 逐 token 相同
```
**结论:闭环成立**。xtrain 练出的权重,经导出后由 xserv 加载并服务,贪心生成与 xtrain 自身**逐 token 一致**
logits 在 BF16 容差内吻合。整条 P0→P6 学习链收口。
> xtrain 采样每步重跑全量 forward无 KV cachexserv 走 KV-cache prefill+decode两者都是对同一
> logits 的 greedy argmax故序列一致。BF16 漂移未在 40 步内造成任何 argmax 翻转。