agentic-kvc

gahow/agentic-kvc

Fork 0

Commit Graph

Author	SHA1	Message	Date
Gahow Wang	fec50fa45d	Layerwise KV transfer on Mooncake: PoC + microbench (worktree exploration) Implements per-layer KV push during prefill (write mode) on vLLM's MooncakeConnector, env-gated by MOONCAKE_LAYERWISE=1. 2-instance microbench (mb7) shows correctness (KV lands, cached==prompt) and that the transfer is hidden behind prefill compute: critical-path overhead drops from O(KV size) (123/202/529ms for 8k/16k/32k) to a flat ~58ms (2-9x), with no prefill slowdown, on idle instances. Caveats: idle-only, chunked-prefill disabled, single concurrent transfer — see DESIGN.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 15:34:43 +08:00

Author

SHA1

Message

Date

Gahow Wang

fec50fa45d

Layerwise KV transfer on Mooncake: PoC + microbench (worktree exploration)

Implements per-layer KV push during prefill (write mode) on vLLM's
MooncakeConnector, env-gated by MOONCAKE_LAYERWISE=1. 2-instance microbench
(mb7) shows correctness (KV lands, cached==prompt) and that the transfer is
hidden behind prefill compute: critical-path overhead drops from O(KV size)
(123/202/529ms for 8k/16k/32k) to a flat ~58ms (2-9x), with no prefill
slowdown, on idle instances. Caveats: idle-only, chunked-prefill disabled,
single concurrent transfer — see DESIGN.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 15:34:43 +08:00

1 Commits