docs: T9 verification results (xserv == xtrain, dash5)

Capture the closed-loop run: train (loss 10.84->3.59) -> export (47 tensors, BF16) -> xserv dump-logits + greedy. Top-1 + top-11 token order identical, logits within ~1e-2 (BF16-vs-f32 drift), greedy generation token-for-token identical across two prompts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 17:37:46 +08:00
parent e246c3bec2
commit 8981cf7982
1 changed files with 40 additions and 2 deletions
--- a/docs/08-export-xserv.md
+++ b/docs/08-export-xserv.md
@@ -133,6 +133,44 @@ crates/xtrain-train/tests/adamw_parity{.py,_dump.rs}# 同上
   ```
   判据：**贪心 token 序列一致**（BF16 推理 vs f32 训练，logits top-1 吻合；分布在 BF16 容差内）。

-## 验证结果
+## 验证结果（dash5 实跑，capture）

-> 待 dash5 实跑回填（push origin main → dash5 pull → 跑 ②③④ → capture）。
+**训练**（CUDA_VISIBLE_DEVICES=0，1200 步，gpt2 vocab，dim 32 / 4 层 / 2 头 / ffn 64，~5M 参，8700 tok/s）：
+`loss 10.84 → 3.59`，贪心采样输出连贯英文（QK-norm 加入后训练/采样无回归）。
+
+**导出**：`export: 47 tensors`（embed + 4×11 + final_norm + lm_head），写出 config.json（见上）+
+model.safetensors（BF16，6.5 MB）+ tokenizer.json。
+
+**① logits 数值对拍**（同 prompt `"Once upon a time"`，token ids `[7454, 2402, 257, 640]`）：
+
+| rank | xtrain (f32) | xserv (BF16) | token |
+|------|--------------|--------------|-------|
+| 0 | 11.7711 | 11.7500 | `,` |
+| 1 | 10.4724 | 10.5000 | ` there` |
+| 2 | 6.6288 | 6.6562 | ` upon` |
+| 3 | 6.5125 | 6.5000 | ` to` |
+| … | … | … | … |
+| 10 | 5.3614 | 5.3438 | ` she` |
+
+top-1 一致（`,`，id 11）；top-11 token 排序完全一致；logit 绝对差 ~1e-2（~0.2–0.9% 相对），
+正是 **BF16 推理 vs f32 训练** 的预期舍入漂移，无结构性误差。
+
+**② 贪心生成逐 token 一致**（xserv `xserv-cli --max-tokens 40` vs xtrain `sample.rs generate temp=0`）：
+
+```
+prompt "Once upon a time":
+  xtrain: Once upon a time, there was a little girl named Lily. Timmy loved to play
+          with her mommy. One day, Timmy's mommy's mommy's mommy. "I'm sorry, I
+  xserv:  Once upon a time, there was a little girl named Lily. Timmy loved to play
+          with her mommy. One day, Timmy's mommy's mommy's mommy. "I'm sorry, I   ← 逐 token 相同
+
+prompt "One day":
+  两侧均: One day, Timmy's mommy's mommy's mommy. "I'm sorry, I can't be careful and
+          be careful. I'm sorry, I can't have a good time.                         ← 逐 token 相同
+```
+
+**结论：闭环成立**。xtrain 练出的权重，经导出后由 xserv 加载并服务，贪心生成与 xtrain 自身**逐 token 一致**，
+logits 在 BF16 容差内吻合。整条 P0→P6 学习链收口。
+
+> 注：xtrain 采样每步重跑全量 forward（无 KV cache），xserv 走 KV-cache prefill+decode；两者都是对同一
+> logits 的 greedy argmax，故序列一致。BF16 漂移未在 40 步内造成任何 argmax 翻转。