agentic-kvc

Go to file

Gahow Wang 098d86385a Add elastic hypotheses tracking doc with H1-H6 analysis

Tracks all hypotheses tested during elastic PD disaggregation research:
- H1 (kv_both overhead): REJECTED — zero overhead at idle
- H2 (PS cold prefill): REJECTED — PS slower than cached C
- H3 (C_s+flexD): PARTIALLY VALIDATED — E2E -9% but HEAVY p90 +117%
- H4 (cache-aware offload): TODO — only offload high-cache-hit HEAVY
- H5 (RDMA overhead): TODO — Mooncake lacks layerwise transfer
- H6 (session migration): TODO — verify D's APC after migration

Key insight: offload decision should be cache-aware (new_tokens),
not size-based (total_input). 80k request with 90% cache = 8k prefill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-23 01:17:12 +08:00

analysis

Add elastic hypotheses tracking doc with H1-H6 analysis

2026-05-23 01:17:12 +08:00

patches

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

replayer

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

scripts

Invalidate prior A/B results + add proper experiment harness