e9e313f9c51486e778fdf2fcca7a04a19b5bdc0c
Investigation confirms vLLM Mooncake connector DOES correctly register externally-received KV blocks in the prefix cache. No bug exists. Evidence from vLLM logs (per-instance): inst_1: prefix_cache=14.7%, external_cache=72.1% <- high external hit inst_4: prefix_cache=52.4%, external_cache=59.0% The 0.5% aggregate APC from /metrics was a measurement artifact: inst_0 received 718M query tokens (cold-start prefills) with 0% hit, diluting the aggregate. D-instances have 20-72% external cache hit. The /metrics endpoint's prefix_cache_hits_total counter does not include external hits. The vLLM log's "External prefix cache hit rate" is the correct metric for Mooncake-transferred KV reuse. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%