Files
xtrain/docs
Gahow Wang 65a2264227 docs: Phase T14 — fused flash-attention design
Design doc for the hand-written single fused flash-attention kernel:
online softmax tiled over KV, NEVER materializing the [bh,S,S] score
matrix; flash-style backward (recompute scores from saved logsumexp +
D=ΣdO·O, dQ/dK/dV). Opt-in --flash; composed T10 path stays default.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 23:10:16 +08:00
..