Files
xtrain/docs
Gahow Wang 30db62d8f2 docs: Phase T12 — bf16 mixed precision design
docs/11-bf16-mixed-precision.md: the AMP split (bf16 linears +
activations, fp32 master / norms / softmax / RoPE / CE, no loss
scaling), the cast-op bridge, module layout, and the dual
verification gate (fp32 unchanged + bf16 looser-tol + convergence +
mem/throughput). Memory/throughput before->after to be filled from
the dash5 bench.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:15:02 +08:00
..