Implementation log (docs/18) + Phase-3 row (evolution.md): the two decode
primitives and their gates, the engine design (host-cache baseline), the
token-identical centerpiece gate, and the measured throughput baseline showing
the cache win is sequence-length-dependent (~1.0x@32, ~1.9x@128, naive OOM@256).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>