agentic-kvc

Files

Gahow Wang 8e0c6e78b0 Add comprehensive research findings document

Synthesizes all experiments into a paper-ready analysis:
- Agentic workload characteristics vs chatbot/API
- Why PD-Sep, LMetric, elastic RDMA, chunk-size tuning don't work
- Why cache-aware session-sticky routing IS the key optimization
  (-60% TTFT, +24pp APC vs round-robin)
- System-level insights: prefill-decode interference threshold,
  Mooncake limitations, effective request weight after cache
- GPU balance → HEAVY TTFT -10.5% (demonstrated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-23 07:16:31 +08:00

adaptive_prefill_offload_design.md

Design doc: Adaptive Prefill Offload

2026-05-22 00:44:22 +08:00

elastic_hypotheses.md

16-session contention: TPOT +45% from prefill-decode interference

2026-05-23 05:51:47 +08:00

elastic_offload_design.md

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

kv_lifecycle_design.md

KV cache lifecycle design + eviction loss analysis