agentic-kvc

Files

Gahow Wang c8ec73c548 Connector tax: high-concurrency confirms +7-9% tax, resolves trace-replay gap

High-concurrency test (512 input, 64 output, rates 4-32 req/s):
  Rate=8:  plain TTFT p90=94ms, mooncake_both=102ms → +9% tax
  Rate=16: plain TTFT p90=144ms, mooncake_both=156ms → +8% tax
  Rate=32: both saturated at ~6.1s → no distinguishable difference

Low-concurrency back-to-back retest (4096 input, 256 output):
  mooncake_both_v2 vs plain_v2: tax is ≈0% (within noise)
  because scheduler's 1.4ms/step is hidden behind model forward.

Decomposition of trace-replay's +45%:
  +7-9% from build_connector_meta per-step cost (this microbench)
  +20-30% from multi-instance coupling amplification (not measurable here)
  remainder from large-cache O(|cache|) scaling (Phase B follow-up)

Also: bench_loop.py now emits mean/p50/p90/p99 for all three metrics.

2026-05-26 21:00:25 +08:00

connector_tax

Connector tax: high-concurrency confirms +7-9% tax, resolves trace-replay gap

2026-05-26 21:00:25 +08:00

interference

Microbench 1 plots: prefill-decode interference heatmap + lines

2026-05-26 14:21:30 +08:00

lifecycle

PD-sep server-side profiling: vLLM patches + per-request breakdown

2026-05-26 13:59:09 +08:00

patches

PD-sep server-side profiling: vLLM patches + per-request breakdown