d76eb02637c9ea78225d6b20193c58592efdecfa
New analysis/characterization/elastic_migration_v2/ packages the
unified_v2 + unified_kv_both experiments into a self-contained
results section that the paper can cite as the "we tried selective
PD-sep migration" case study. The section finds three independent
reasons PD-sep doesn't help on agentic w600:
1. Mooncake kv_both substrate alone (no PD-sep ever firing) imposes
TTFT p90 +45%, TPOT p90 +25%, hotspot index +19% vs plain
unified. Per-step KVConnectorMetadata maintenance and block
reservation semantics dominate even when no transfer is pending.
2. PD-sep gate fires only 0.16-0.41% of requests across two
gate-tightness configurations. 88-76% are killed by
new_local < threshold because 93% intra-session reuse on agentic
traces leaves a small uncached tail; 19% are killed by
chosen_no_active_decode (snapshot-time gate). Even relaxed
thresholds can't grow trigger rate past 0.5%.
3. When PD-sep fires, the calibrated cost model
(0.3s + bytes / 2.7 GB/s) is wrong by 10-20x. 5 triggered
requests in v2.1 saw realized TTFT 12-45s vs model-predicted
migrate cost 0.7-2.2s, consistent with the E2 audit's finding
that D-side block pre-reservation and missing layerwise
pipelining dominate the decode_sent -> first_token clock.
Three-way comparison (unified vs unified_kv_both vs unified_v2):
v2 vs the kv_both control is roughly net-zero (-10% hotspot,
-14% TPOT p90, +3% TTFT p90, +9% TTFT p99). v2 vs plain unified is
strictly worse by 27-49% across latency percentiles because the
kv_both substrate tax is unavoidable when the policy is enabled.
Contents:
- README.md: the four results sections, the three-way comparison
table, an explicit "what this claims for the paper" list, and a
cross-reference index to the earlier characterization documents.
- data/: b3_policy_comparison.json + per-policy breakdown.json
+ per-policy hotspot_index.json for the four policies in scope.
- figures/: 4 PNGs rendered by render_figures.py:
* fig_kv_both_overhead.png — 4-metric bar chart with delta
annotations showing kv_both alone costs +45% TTFT p90.
* fig_v2_trigger_funnel.png — per-reason request count for the
two gate configurations on log scale.
* fig_v2_predicted_vs_actual.png — scatter of model-predicted
migrate cost vs realized TTFT for the 5 triggered requests,
with y=x, 10x, and 20x reference lines.
* fig_three_way_hotspot.png — per-worker TTFT p90 grouped bars
across the three policies.
The section is intentionally self-contained: it lists what the
experiment validates (cost model picks correct candidates;
shadow-drift fix is necessary; same-worker interference is real)
alongside what it disproves (per-request PD-sep on agentic via
Mooncake is not a net win in current implementation).
Refs: E1/E2 subagent audits, B2 microbench, unified_v2 commits
19f69a9 / 4b833d3 / 95c8ef8.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%