Files
xtrain/docs
Gahow Wang 0f76c0fdb0 docs: M2b — batched decode results (token-identical + ~1.7x rollout, device-cache next)
Implementation log (docs/18) + Phase-3 row (evolution.md): rope_pos primitive +
gate, the batched engine (decode_attention/repeat_kv reused), the token-
identical batch gate, and the measured ~1.7x rollout-inclusive step speedup +
memory stabilization. Closes the M2 decode engine (M2a single-seq + M2b
batched); names the device-side cache as the remaining lever.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 17:20:01 +08:00
..