agentic-kvc

Go to file

Gahow Wang 8e0c6e78b0 Add comprehensive research findings document

Synthesizes all experiments into a paper-ready analysis:
- Agentic workload characteristics vs chatbot/API
- Why PD-Sep, LMetric, elastic RDMA, chunk-size tuning don't work
- Why cache-aware session-sticky routing IS the key optimization
  (-60% TTFT, +24pp APC vs round-robin)
- System-level insights: prefill-decode interference threshold,
  Mooncake limitations, effective request weight after cache
- GPU balance → HEAVY TTFT -10.5% (demonstrated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-23 07:16:31 +08:00

analysis

Add comprehensive research findings document

2026-05-23 07:16:31 +08:00

patches

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

scripts

Chunk-size ablation + comprehensive synthesis