Files
xtrain/docs
Gahow Wang b39e6e7110 docs: M2a — KV-cache decode engine results (token-identical + length-dependent speedup)
Implementation log (docs/18) + Phase-3 row (evolution.md): the two decode
primitives and their gates, the engine design (host-cache baseline), the
token-identical centerpiece gate, and the measured throughput baseline showing
the cache win is sequence-length-dependent (~1.0x@32, ~1.9x@128, naive OOM@256).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 12:01:10 +08:00
..