agentic-kvc

Files

Gahow Wang ef9e0102ec Connector tax: trace-replay confirms +45% kv_both penalty is gone; DR-fix adds 22% more

Re-runs the elastic_migration_v2 trace (w600 r0.0015 st30, 1214 reqs,
274 sessions, 8×TP1 vLLM + cache_aware_proxy) with three configs:
- plain unified
- unified + Mooncake kv_both
- unified + Mooncake kv_both + DR-fix (env-gated O(|cache|) hash sync removal)

TTFT p90: 11.97 s → 9.74 s (−18.6%) → 7.58 s (−36.6% vs plain)
E2E p90:  23.48 s → 21.25 s (−9.5%) → 17.93 s (−23.6% vs plain)

Two findings:
1. The "+45% kv_both penalty" claim from elastic_migration_v2 is OBSOLETE
   on current codebase — kv_both is now *faster* than plain at p90.
   Likely fixed by e3a1d70 (RDMA-READ → bootstrap PUSH refactor) and
   the connector-mode delay_free_blocks extending cross-turn prefix
   cache hits on a 93%-intra-session-reuse trace.
2. DR-fix removes another 22% from TTFT p90 by skipping the
   O(|cache|) hash sync in build_connector_meta. Cache-sweep with
   DR-fix shows slope drops from +94.5 to +2.3 μs/1k blocks.

Adds:
- run_trace_replay_drfix.sh: A/B/C harness (env CT_DR_FIX gates patch)
- analyze_trace_replay.py: TTFT/TPOT/E2E delta analysis
- REPORT_TRACE_REPLAY.md: summary + reproduction
- results/20260526_1627_drfix/: cache-sweep with DR-fix
- results/trace_replay_20260526_1652/: full trace-replay A/B/C

Implication for EAR paper: the kv_both substrate is no longer the
bottleneck blocking session migration. The prior 4 migration reverts
were dominated by transfer overhead that has now been characterized
and (partially) removed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-27 09:13:50 +08:00

connector_tax

Connector tax: trace-replay confirms +45% kv_both penalty is gone; DR-fix adds 22% more

2026-05-27 09:13:50 +08:00

interference

Microbench 1 plots: prefill-decode interference heatmap + lines

2026-05-26 14:21:30 +08:00

lifecycle

PD-sep server-side profiling: vLLM patches + per-request breakdown

2026-05-26 13:59:09 +08:00

patches

PD-sep server-side profiling: vLLM patches + per-request breakdown