agentic-kvc

Go to file

Gahow Wang e9e313f9c5 P2P cache analysis: external KV correctly registered in prefix cache

Investigation confirms vLLM Mooncake connector DOES correctly register
externally-received KV blocks in the prefix cache. No bug exists.

Evidence from vLLM logs (per-instance):
  inst_1: prefix_cache=14.7%, external_cache=72.1%  <- high external hit
  inst_4: prefix_cache=52.4%, external_cache=59.0%

The 0.5% aggregate APC from /metrics was a measurement artifact:
inst_0 received 718M query tokens (cold-start prefills) with 0% hit,
diluting the aggregate. D-instances have 20-72% external cache hit.

The /metrics endpoint's prefix_cache_hits_total counter does not include
external hits. The vLLM log's "External prefix cache hit rate" is the
correct metric for Mooncake-transferred KV reuse.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-22 13:25:34 +08:00

analysis

Update report: adaptive v2 confirms no KV transfer helps single-machine

2026-05-22 10:15:08 +08:00

patches

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer

Balanced session-sticky routing + agentic workload pattern analysis

2026-05-22 01:50:27 +08:00

scripts

P2P cache analysis: external KV correctly registered in prefix cache