Implementation log (docs/18) + Phase-3 row (evolution.md): cat_seq device cache,
gates hold (token-identical), and the profile-first finding — ~10% single-seq
decode but no GRPO-step change because the long pole shifted to the per-sample
logp/PG forwards after M2b batching. Names ragged batched prefill as the next
decode lever.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>