gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang dc8e6dd5a8 v2 exp(a): add remote KV-store (RDMA) tier

Extends the hit-latency microbench to a 4th tier: a remote global-KV-store
hit over RDMA, the Mooncake-Store mechanism. Two kv_both MooncakeConnector
instances (run_rdma.sh); for each prefix length, instance B serves the
request by pulling instance A's cached prefix over RDMA (do_remote_prefill,
via microbench/fresh_setup/mb2_kv_transfer.py) instead of recomputing -- the
timed pull is the remote-hit latency.

Result (TTFT p50, 11 reps): strict tier ordering
GPU(HBM) < CPU(local DRAM) < remote-RDMA-store << miss, gaps growing with
context. At 64k: GPU 0.11s, CPU 0.27s, RDMA 0.97s, miss 15.2s -> miss/RDMA
15.8x, RDMA/CPU 3.6x, CPU/GPU 2.4x. So a global RDMA store is a real win
over recompute (the blog's 46x) but pays the NIC tax (~5-7 GB/s effective)
and sits a tier below local CPU and two below GPU -- reinforcing
GPU-hit-first. README + figure updated to four tiers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 12:48:37 +08:00

Workload characterization C1-C3 on full production trace

2026-05-29 18:19:39 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Workload characterization C1-C3 on full production trace

2026-05-29 18:19:39 +08:00

PD-disagg crossover: regular synthetic trace + goodput sweep + figure

2026-05-29 18:19:23 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

Replayer: closed-loop inter-turn think-time mode

2026-05-29 18:19:12 +08:00

Add leastwork_kappa decode-aware ablation (net-negative, documented)

2026-05-29 17:07:23 +08:00

unified_v2.1: relax gates + add unified_kv_both isolation control

2026-05-26 10:40:57 +08:00

third_party/vllm

Gate evict_sent_blocks behind VLLM_EVICT_SENT_BLOCKS

2026-05-29 18:18:59 +08:00

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

v2 exp(a): add remote KV-store (RDMA) tier

2026-05-30 12:48:37 +08:00

.gitignore

600s-truncated trace + LPWL 5-policy results

2026-05-29 16:08:35 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

MEETING.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

PAPER_OUTLINE.md

§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

2026-05-27 16:51:38 +08:00

pyproject.toml

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

RESULTS_SUMMARY.md

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00