docs: T9 verification results (xserv == xtrain, dash5)
Capture the closed-loop run: train (loss 10.84->3.59) -> export (47 tensors, BF16) -> xserv dump-logits + greedy. Top-1 + top-11 token order identical, logits within ~1e-2 (BF16-vs-f32 drift), greedy generation token-for-token identical across two prompts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -133,6 +133,44 @@ crates/xtrain-train/tests/adamw_parity{.py,_dump.rs}# 同上
|
||||
```
|
||||
判据:**贪心 token 序列一致**(BF16 推理 vs f32 训练,logits top-1 吻合;分布在 BF16 容差内)。
|
||||
|
||||
## 验证结果
|
||||
## 验证结果(dash5 实跑,capture)
|
||||
|
||||
> 待 dash5 实跑回填(push origin main → dash5 pull → 跑 ②③④ → capture)。
|
||||
**训练**(CUDA_VISIBLE_DEVICES=0,1200 步,gpt2 vocab,dim 32 / 4 层 / 2 头 / ffn 64,~5M 参,8700 tok/s):
|
||||
`loss 10.84 → 3.59`,贪心采样输出连贯英文(QK-norm 加入后训练/采样无回归)。
|
||||
|
||||
**导出**:`export: 47 tensors`(embed + 4×11 + final_norm + lm_head),写出 config.json(见上)+
|
||||
model.safetensors(BF16,6.5 MB)+ tokenizer.json。
|
||||
|
||||
**① logits 数值对拍**(同 prompt `"Once upon a time"`,token ids `[7454, 2402, 257, 640]`):
|
||||
|
||||
| rank | xtrain (f32) | xserv (BF16) | token |
|
||||
|------|--------------|--------------|-------|
|
||||
| 0 | 11.7711 | 11.7500 | `,` |
|
||||
| 1 | 10.4724 | 10.5000 | ` there` |
|
||||
| 2 | 6.6288 | 6.6562 | ` upon` |
|
||||
| 3 | 6.5125 | 6.5000 | ` to` |
|
||||
| … | … | … | … |
|
||||
| 10 | 5.3614 | 5.3438 | ` she` |
|
||||
|
||||
top-1 一致(`,`,id 11);top-11 token 排序完全一致;logit 绝对差 ~1e-2(~0.2–0.9% 相对),
|
||||
正是 **BF16 推理 vs f32 训练** 的预期舍入漂移,无结构性误差。
|
||||
|
||||
**② 贪心生成逐 token 一致**(xserv `xserv-cli --max-tokens 40` vs xtrain `sample.rs generate temp=0`):
|
||||
|
||||
```
|
||||
prompt "Once upon a time":
|
||||
xtrain: Once upon a time, there was a little girl named Lily. Timmy loved to play
|
||||
with her mommy. One day, Timmy's mommy's mommy's mommy. "I'm sorry, I
|
||||
xserv: Once upon a time, there was a little girl named Lily. Timmy loved to play
|
||||
with her mommy. One day, Timmy's mommy's mommy's mommy. "I'm sorry, I ← 逐 token 相同
|
||||
|
||||
prompt "One day":
|
||||
两侧均: One day, Timmy's mommy's mommy's mommy. "I'm sorry, I can't be careful and
|
||||
be careful. I'm sorry, I can't have a good time. ← 逐 token 相同
|
||||
```
|
||||
|
||||
**结论:闭环成立**。xtrain 练出的权重,经导出后由 xserv 加载并服务,贪心生成与 xtrain 自身**逐 token 一致**,
|
||||
logits 在 BF16 容差内吻合。整条 P0→P6 学习链收口。
|
||||
|
||||
> 注:xtrain 采样每步重跑全量 forward(无 KV cache),xserv 走 KV-cache prefill+decode;两者都是对同一
|
||||
> logits 的 greedy argmax,故序列一致。BF16 漂移未在 40 步内造成任何 argmax 翻转。
|
||||
|
||||
Reference in New Issue
Block a user