agentic-kvc

Go to file

Gahow Wang d76eb02637 Elastic migration v2 section: PD-sep on agentic workload is net negative

New analysis/characterization/elastic_migration_v2/ packages the
unified_v2 + unified_kv_both experiments into a self-contained
results section that the paper can cite as the "we tried selective
PD-sep migration" case study. The section finds three independent
reasons PD-sep doesn't help on agentic w600:

1. Mooncake kv_both substrate alone (no PD-sep ever firing) imposes
   TTFT p90 +45%, TPOT p90 +25%, hotspot index +19% vs plain
   unified. Per-step KVConnectorMetadata maintenance and block
   reservation semantics dominate even when no transfer is pending.
2. PD-sep gate fires only 0.16-0.41% of requests across two
   gate-tightness configurations. 88-76% are killed by
   new_local < threshold because 93% intra-session reuse on agentic
   traces leaves a small uncached tail; 19% are killed by
   chosen_no_active_decode (snapshot-time gate). Even relaxed
   thresholds can't grow trigger rate past 0.5%.
3. When PD-sep fires, the calibrated cost model
   (0.3s + bytes / 2.7 GB/s) is wrong by 10-20x. 5 triggered
   requests in v2.1 saw realized TTFT 12-45s vs model-predicted
   migrate cost 0.7-2.2s, consistent with the E2 audit's finding
   that D-side block pre-reservation and missing layerwise
   pipelining dominate the decode_sent -> first_token clock.

Three-way comparison (unified vs unified_kv_both vs unified_v2):
v2 vs the kv_both control is roughly net-zero (-10% hotspot,
-14% TPOT p90, +3% TTFT p90, +9% TTFT p99). v2 vs plain unified is
strictly worse by 27-49% across latency percentiles because the
kv_both substrate tax is unavoidable when the policy is enabled.

Contents:
- README.md: the four results sections, the three-way comparison
  table, an explicit "what this claims for the paper" list, and a
  cross-reference index to the earlier characterization documents.
- data/: b3_policy_comparison.json + per-policy breakdown.json
  + per-policy hotspot_index.json for the four policies in scope.
- figures/: 4 PNGs rendered by render_figures.py:
  * fig_kv_both_overhead.png   — 4-metric bar chart with delta
    annotations showing kv_both alone costs +45% TTFT p90.
  * fig_v2_trigger_funnel.png  — per-reason request count for the
    two gate configurations on log scale.
  * fig_v2_predicted_vs_actual.png  — scatter of model-predicted
    migrate cost vs realized TTFT for the 5 triggered requests,
    with y=x, 10x, and 20x reference lines.
  * fig_three_way_hotspot.png  — per-worker TTFT p90 grouped bars
    across the three policies.

The section is intentionally self-contained: it lists what the
experiment validates (cost model picks correct candidates;
shadow-drift fix is necessary; same-worker interference is real)
alongside what it disproves (per-request PD-sep on agentic via
Mooncake is not a net win in current implementation).

Refs: E1/E2 subagent audits, B2 microbench, unified_v2 commits
19f69a9 / 4b833d3 / 95c8ef8.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-26 13:28:37 +08:00