Files
xtrain/docs
Gahow Wang 41d46208a6 docs: M2c — device KV cache + the bottleneck-shift finding
Implementation log (docs/18) + Phase-3 row (evolution.md): cat_seq device cache,
gates hold (token-identical), and the profile-first finding — ~10% single-seq
decode but no GRPO-step change because the long pole shifted to the per-sample
logp/PG forwards after M2b batching. Names ragged batched prefill as the next
decode lever.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 17:39:10 +08:00
..