Add NIXL substrate isolation control + attribution decomposition

Adds unified_nixl_both to elastic_migration_v2: same picker as
unified_kv_both (never triggers PD-sep), but launches vLLM with
NixlConnector instead of MooncakeConnector. Compared against plain
unified and unified_kv_both (Mooncake) we can now attribute the
substrate overhead between "v1 connector framework irreducible
cost" (proxied by the leaner NIXL) and "Mooncake implementation
extra" (Mooncake - NIXL).

Result (vs plain unified, both substrates never PD-sep):

   metric          plain    NIXL          Mooncake
   TTFT p90        7.35s    +37.9%        +45.3%      (NIXL: +7pp better)
   TPOT p90        17.1ms   +15.5%        +24.5%      (NIXL: +9pp better)
   E2E p90         18.03s   +17.4%        +27.0%      (NIXL: +10pp better)
   hotspot         3.667    +0.2%         +19.0%      (NIXL: keeps it flat)
   APC             79.4%    -0.3pp        -1.1pp
   interference    -        5.58          8.57         (NIXL: ~35% lower)

The cleanest signal is hotspot: NIXL preserves plain-unified's
distribution (3.674 vs 3.667), while Mooncake's per-scheduler-step
O(|cache|) `set(self._block_pool.cache.keys())` diff against
_known_hash_keys (mooncake_connector.py:432-456) inflates routing
imbalance by 19%. The hash sync runs unconditionally even when no
direct_read consumer is present.

Attribution: NIXL-plain ~= v1 framework irreducible cost (kv_buffer
GPU memory, per-step SchedulerOutput.kv_connector_metadata
round-trip, altered kv_cache_manager block-lifecycle). Mooncake-NIXL
~= Mooncake-specific overhead (the hash-sync loop and stricter
delay_free semantics).

Practical implication: NIXL is meaningfully better than Mooncake on
this stack, but even NIXL imposes 16-38% across metrics — too
expensive for selective-PD-sep on agentic workloads where the
trigger rate is < 0.5%.

Launch fixes required for NIXL multi-instance:
- VLLM_NIXL_SIDE_CHANNEL_PORT must be unique per instance (default
  5600; we use 5600+i). Without this, 7 of 8 instances silently hang
  in `zmq.error.ZMQError: Address already in use` and the launcher
  trap kills all of them at health-check timeout.
- Health-check timeout raised from 180s to 360s; NIXL initialization
  (UCX agent + memory registration) is ~100-150s per instance under
  8-way concurrent load, vs Mooncake's ~30-60s.

New figure: fig_connector_substrate_attribution.png stacks plain /
framework / Mooncake-extra / v2-branch overhead per metric.
Existing figures (fig_kv_both_overhead, fig_three_way_hotspot)
updated to include NIXL as a fourth bar.

README updated with 4-way table, Result 1 reframed as "the cost is
mostly framework, not Mooncake — but Mooncake adds the hotspot
penalty", and the substrate-vs-PD-sep tradeoff math.

Refs: nixl_connector.py:700 handshake listener bind, factory.py
register_connector for the NixlConnector entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-26 16:02:12 +08:00
parent 645b067dd4
commit dc6d24d1ca
8 changed files with 235 additions and 83 deletions

View File

@@ -9,82 +9,127 @@ Model: Qwen3-Coder-30B-A3B-Instruct, 8 × TP1 on H20
This section explores whether the **B2-confirmed same-worker
prefilldecode interference** can be relieved by selectively
migrating prefill to a different worker for the requests where the
interference cost would dominate the transfer cost. We implement two
flavors of the policy (strict gates, then relaxed gates) and a clean
isolation control (`unified_kv_both`: same picker as `unified`, but
the vLLMs are launched in `kv_role=kv_both` so the Mooncake
substrate is on but never triggers).
interference cost would dominate the transfer cost. We implement
two flavors of the routing policy (strict gates, then relaxed
gates) and **two isolation controls** that use the unified picker
but launch vLLMs in `kv_role=kv_both` so the connector substrate
is on but never PD-seps:
Three findings:
- `unified_kv_both`: with **MooncakeConnector**
- `unified_nixl_both`: with **NixlConnector** (NVIDIA's official
v1 connector; isolates connector implementation from policy)
1. **`kv_role=kv_both` alone imposes a heavy always-on tax**: TTFT
p90 +45%, TPOT p90 +25%, hotspot index +19% vs plain `unified`,
with no PD-sep ever firing.
2. **PD-sep almost never triggers on a real agentic workload**:
Four findings:
1. **`kv_role=kv_both` imposes a substantial always-on tax even
when no PD-sep ever fires**: with Mooncake it's TTFT p90 +45%,
TPOT p90 +25%, hotspot +19%; with NIXL it's TTFT p90 +38%,
TPOT p90 +16%, hotspot +0.2%.
2. **About half of the substrate cost is generic v1-connector
framework overhead** (proxied by NIXL since it's the leanest
implementation): KV buffer GPU memory cut from the model's
working budget, `SchedulerOutput.kv_connector_metadata`
round-trip, and altered `kv_cache_manager` block-lifecycle
semantics. **NIXL is meaningfully better than Mooncake** but
still imposes a 16-38% tax vs no connector.
3. **PD-sep almost never triggers on a real agentic workload**:
0.16% with strict gates, 0.41% with relaxed gates. Agentic
workloads have 93% intra-session reuse, so most requests land on
workers that already hold cache — the uncached tail is too small
to be worth migrating.
3. **When PD-sep does fire, the cost model is wrong by ~1020×**:
workloads have 93% intra-session reuse, so most requests land
on workers that already hold cache — the uncached tail is too
small to be worth migrating.
4. **When PD-sep does fire, the cost model is wrong by ~1020×**:
the calibrated `0.3s + bytes / 2.7 GB/s` predicts 12 s migrate
cost; observed TTFT on triggered requests is 1245 s. The same
D-side block-reservation pressure and absence of layerwise
pipelining that the E2 audit flagged still dominate.
cost; observed TTFT on triggered requests is 1245 s.
The net latency of `unified_v2` is **not better than plain
`unified`**. Improving agentic PD-sep requires fixing the underlying
Mooncake transfer mechanism (E2 patches 6.1 lazy block reservation
and 6.3 layerwise pipelining), not the routing decision.
`unified`** under either Mooncake or NIXL substrate. Improving
agentic PD-sep requires (a) using the leaner connector (NIXL >
Mooncake by 5-19 pp across metrics), and (b) fixing the underlying
transfer mechanism (E2 patches 6.1 lazy block reservation and 6.3
layerwise pipelining), not just the routing decision.
## Substrate
We compare three policies on identical traces:
We compare four policies on identical traces:
| policy | picker | vLLM launch mode | what's it for |
|---|---|---|---|
| `unified` | hybrid affinity + LMetric | plain (no Mooncake) | the headline baseline |
| `unified_kv_both` | same as `unified` | `kv_role=kv_both` + bootstrap | isolation control: how much does kv_both *alone* cost? |
| `unified_v2` | unified + selective PD-sep | `kv_role=kv_both` + bootstrap | the actual experiment |
| `unified` | hybrid affinity + LMetric | plain (no connector) | the headline baseline |
| `unified_kv_both` | same as `unified` | `MooncakeConnector` + `kv_both` | substrate control: Mooncake cost without PD-sep |
| `unified_nixl_both` | same as `unified` | `NixlConnector` + `kv_both` | substrate control: NIXL cost without PD-sep, attributes overhead to "framework vs Mooncake" |
| `unified_v2` | unified + selective PD-sep | `MooncakeConnector` + `kv_both` + bootstrap | the actual experiment |
All three use the same trace, the same 8-instance topology, the same
All four use the same trace, the same 8-instance topology, the same
shadow-driftcorrected proxy (`scripts/cache_aware_proxy.py` post-fix
`95c8ef8`). Plain `unified` was rerun on the patched proxy
(`b3_sweep_20260525_095043/unified`) under the same conditions.
## Result 1 — kv_both is expensive by itself
NIXL required two launch fixes beyond Mooncake:
- `VLLM_NIXL_SIDE_CHANNEL_PORT` must be unique per instance
(default 5600 → 5600..5607); otherwise instances 2..8 silently
hang in `zmq.error.ZMQError: Address already in use`.
- Health-check timeout had to be raised from 180 s to 360 s
because NIXL initialization (UCX agent + memory registration)
takes ~100-150 s per instance under 8-way concurrent launch.
## Result 1 — kv_both is expensive by itself, and only partly Mooncake's fault
![](figures/fig_kv_both_overhead.png)
Switching the vLLM launch from plain to `kv_role=kv_both` without
ever triggering PD-sep already costs:
ever triggering PD-sep imposes a substrate tax. We compare the two
connectors available in vendored vLLM:
| metric | plain `unified` | `unified_kv_both` | Δ |
|---|---:|---:|---|
| TTFT p50 | 0.50 s | 0.50 s | +0% |
| TTFT p90 | 7.35 s | 10.67 s | **+45%** |
| TTFT p99 | 42.34 s | 45.19 s | +7% |
| TPOT p90 | 17.1 ms | 21.3 ms | **+25%** |
| E2E p90 | 18.03 s | 22.89 s | **+27%** |
| APC | 79.4% | 78.3% | 1.1 pp |
| hotspot index | 3.667 | **4.363** | **+19%** |
| metric | plain `unified` | `unified_nixl_both` | `unified_kv_both` (Mooncake) |
|---|---:|---:|---:|
| TTFT p50 | 0.50 s | 0.51 s (+1%) | 0.50 s (+0%) |
| **TTFT p90** | 7.35 s | **10.13 s (+38%)** | **10.67 s (+45%)** |
| TTFT p99 | 42.34 s | 44.58 s (+5%) | 45.19 s (+7%) |
| TPOT p90 | 17.1 ms | **19.8 ms (+16%)** | **21.3 ms (+25%)** |
| E2E p90 | 18.03 s | **21.18 s (+17%)** | **22.89 s (+27%)** |
| APC | 79.4% | 79.1% (0.3 pp) | 78.3% (1.1 pp) |
| **hotspot index** | 3.667 | **3.674 (+0.2%)** | **4.363 (+19%)** |
| interference index | n/a | 5.58 | 8.57 |
Two contributing factors:
![](figures/fig_connector_substrate_attribution.png)
1. **The Mooncake `MooncakeConnector` runs even when no transfer is
pending.** Every scheduler step it walks `set(cache.keys())`
against `_known_hash_keys` (E2 audit §6.5) and updates the
`KVConnectorMetadata`. This is O(|cache|) per step on every
engine, even when no producer/consumer relationship is active.
2. **Block reservation semantics differ** under kv_both. The
scheduler treats blocks as candidates for export-to-others, so
the prefix cache LRU pressure is slightly different (we lose 1
pp APC).
Reading the table from left to right gives a clean attribution:
Practical implication: **you don't enable kv_both for free**. If a
deployment wants the option to do PD-sep selectively, the 45% TTFT
p90 tax applies even on requests that stay local. This needs to
recoverable cost before any selective-PD-sep policy is worth
shipping.
- **NIXLplain** = the **v1-connector framework's irreducible cost**
(TTFT p90 +38%, TPOT p90 +16%, E2E p90 +17%). This is the cost
*any* v1 KV connector imposes:
- the 1 GB `kv_buffer_size` carved from `gpu-memory-utilization`,
reducing the KV cache budget;
- per-step `SchedulerOutput.kv_connector_metadata` serialization
and round-trip through the connector worker;
- altered block-lifecycle semantics in `kv_cache_manager`
(`delay_free_blocks=True` is the default once any connector is
loaded, slowing LRU eviction).
- **MooncakeNIXL** = the **Mooncake-implementation-specific extra**
(TTFT p90 +7 pp, TPOT p90 +9 pp, E2E p90 +10 pp, hotspot +19 pp).
This is the cost Mooncake's design choices add on top of the
generic framework:
- per-scheduler-step `set(self._block_pool.cache.keys())` diff
against `_known_hash_keys` (`mooncake_connector.py:432-456`)
walks O(|cache|) on every step on every engine, costing ~4 M
set operations per second on a 200 k-block cache;
- the hash sync runs even when no `direct_read` consumer is
present, so the cost is paid unconditionally;
- block-lifecycle is further constrained because Mooncake
requires `delay_free` until the explicit `finished_sending`
arrives, vs NIXL which can release blocks earlier.
The **most striking gap is hotspot**: Mooncake's per-step hash
sync runs on the scheduler's GIL and disrupts the timeliness of
routing decisions, amplifying load imbalance by 19%. NIXL has no
equivalent global-state maintenance and preserves the plain-unified
hotspot to within 0.2%.
Practical implication: **you don't enable any v1 KV connector for
free**, but if you have to enable one, NIXL is meaningfully cheaper
than Mooncake. Even NIXL's 38% TTFT p90 tax is large enough that
PD-sep needs to recover it on a non-trivial fraction of requests
before being worth it.
## Result 2 — PD-sep rarely fires on a real agentic trace
@@ -153,24 +198,24 @@ The first-token clock for the 49 k request is **21× the model's
prediction**. This is not a small mis-tuning — it's a structurally
different model.
## Result 4 — three-way comparison
## Result 4 — four-way comparison
![](figures/fig_three_way_hotspot.png)
The full table:
| metric | unified (plain) | unified_kv_both | unified_v2 (relaxed) |
|---|---:|---:|---:|
| n_ok | 1214 | 1214 | 1214 |
| TTFT p50 | 0.50 s | 0.50 s | 0.49 s |
| TTFT p90 | 7.35 s | 10.67 s | 10.98 s |
| TTFT p99 | 42.34 s | 45.19 s | 49.45 s |
| TPOT p90 | 17.1 ms | 21.3 ms | 18.4 ms |
| E2E p90 | 18.03 s | 22.89 s | 22.53 s |
| APC | 79.4% | 78.3% | 77.6% |
| interference index | n/a (no engine_state) | 8.57 | 8.46 |
| hotspot index | 3.667 | 4.363 | 3.910 |
| n_slow | 189 | 198 | 198 |
| metric | unified (plain) | unified_nixl_both | unified_kv_both (Mooncake) | unified_v2 (relaxed) |
|---|---:|---:|---:|---:|
| n_ok | 1214 | 1214 | 1214 | 1214 |
| TTFT p50 | 0.50 s | 0.51 s | 0.50 s | 0.49 s |
| TTFT p90 | 7.35 s | 10.13 s | 10.67 s | 10.98 s |
| TTFT p99 | 42.34 s | 44.58 s | 45.19 s | 49.45 s |
| TPOT p90 | 17.1 ms | 19.8 ms | 21.3 ms | 18.4 ms |
| E2E p90 | 18.03 s | 21.18 s | 22.89 s | 22.53 s |
| APC | 79.4% | 79.1% | 78.3% | 77.6% |
| interference index | n/a | 5.58 | 8.57 | 8.46 |
| hotspot index | 3.667 | 3.674 | 4.363 | 3.910 |
| n_slow | 189 | 192 | 198 | 198 |
### v2 vs the kv_both control (the right comparison)

View File

@@ -155,6 +155,32 @@
"unknown": 49
}
},
{
"policy": "unified_nixl_both",
"n_ok": 1214,
"n_total": 1214,
"ttft_p50_s": 0.5138550130068325,
"ttft_p90_s": 10.127110345300755,
"ttft_p99_s": 44.5789094621703,
"tpot_p50_s": 0.008423213202440761,
"tpot_p90_s": 0.019759515867947428,
"tpot_p99_s": 0.1079433335279151,
"e2e_p50_s": 1.866590676479973,
"e2e_p90_s": 21.179128799570027,
"e2e_p99_s": 96.01196486203865,
"apc_ratio": 0.791441828164218,
"interference_index": 5.580715970433481,
"hotspot_index_ttft_p90": 3.673957447190547,
"reuse_intra_frac": 0.930632797070364,
"reuse_cross_frac": 0.05718149217603143,
"n_slow": 192,
"failure_counts": {
"cache_miss_large_append": 21,
"hot_worker_queue": 75,
"same_worker_prefill_overlap": 72,
"unknown": 24
}
},
{
"policy": "unified_v2",
"n_ok": 1214,

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,24 @@
{
"hotspot_index_ttft_p90": 3.673957447190547,
"per_worker_latency_p90_s": {
"http://127.0.0.1:8000": 21.5702620673168,
"http://127.0.0.1:8001": 21.44246501957532,
"http://127.0.0.1:8002": 7.497513776784784,
"http://127.0.0.1:8003": 18.975387462502113,
"http://127.0.0.1:8004": 27.733961877820548,
"http://127.0.0.1:8005": 14.178356938017535,
"http://127.0.0.1:8006": 25.44877168269595,
"http://127.0.0.1:8007": 54.500166546402035
},
"per_worker_ttft_p90_s": {
"http://127.0.0.1:8000": 7.380765471985799,
"http://127.0.0.1:8001": 14.109222683508415,
"http://127.0.0.1:8002": 3.001173847797329,
"http://127.0.0.1:8003": 14.087287129514152,
"http://127.0.0.1:8004": 14.151121024426537,
"http://127.0.0.1:8005": 6.165523712011057,
"http://127.0.0.1:8006": 6.314287615299688,
"http://127.0.0.1:8007": 39.43635586597957
},
"status": "supported"
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 70 KiB

After

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 57 KiB

View File

@@ -34,37 +34,39 @@ def _load(name: str):
POLICY_COLORS = {
"unified": "#2ca02c",
"unified_kv_both": "#9467bd",
"unified_v2": "#d62728",
"unified_v2_strict": "#ff7f0e",
"unified": "#2ca02c",
"unified_kv_both": "#9467bd",
"unified_nixl_both": "#1f77b4",
"unified_v2": "#d62728",
"unified_v2_strict": "#ff7f0e",
}
def fig_kv_both_overhead():
comp = _load("b3_policy_comparison.json")
by = {r["policy"]: r for r in comp["rows"]}
pols = ["unified", "unified_kv_both", "unified_v2"]
pols = ["unified", "unified_kv_both", "unified_nixl_both", "unified_v2"]
metrics = [
("TTFT p90 (s)", lambda r: r["ttft_p90_s"]),
("TPOT p90 (ms)", lambda r: r["tpot_p90_s"] * 1000),
("E2E p90 (s)", lambda r: r["e2e_p90_s"]),
("hotspot index", lambda r: r["hotspot_index_ttft_p90"]),
]
fig, axes = plt.subplots(1, 4, figsize=(14, 4))
fig, axes = plt.subplots(1, 4, figsize=(15, 4.2))
for ax, (label, fn) in zip(axes, metrics):
vals = [fn(by[p]) for p in pols]
bars = ax.bar(pols, vals,
labels_short = [p.replace("unified_", "") for p in pols]
labels_short[0] = "plain"
bars = ax.bar(labels_short, vals,
color=[POLICY_COLORS[p] for p in pols],
edgecolor="black", linewidth=0.5)
ax.set_title(label)
ax.tick_params(axis="x", rotation=20, labelsize=9)
ax.tick_params(axis="x", rotation=15, labelsize=9)
for b, v in zip(bars, vals):
ax.text(b.get_x() + b.get_width() / 2, v,
f"{v:.2f}" if v < 100 else f"{v:.0f}",
ha="center", va="bottom", fontsize=9)
ax.grid(alpha=0.3, axis="y")
# delta annotation
baseline = vals[0]
for i, v in enumerate(vals):
if i == 0:
@@ -74,8 +76,8 @@ def fig_kv_both_overhead():
fontsize=10, fontweight="bold",
color="darkred" if pct > 0 else "darkgreen")
fig.suptitle(
"kv_both adds ~45% to TTFT p90 even without PD-sep firing.\n"
"v2's PD-sep barely recovers the gap (and overshoots TTFT p99)."
"Mooncake substrate adds 19-45% across metrics; NIXL is 5-19pp better but\n"
"still 16-38% above plain. v2's 5 PD-sep events don't recover the substrate tax."
)
fig.tight_layout()
fig.savefig(OUT / "fig_kv_both_overhead.png", dpi=120)
@@ -203,27 +205,29 @@ def fig_v2_predicted_vs_actual():
def fig_three_way_hotspot():
pols = ["unified", "unified_kv_both", "unified_v2"]
pols = ["unified", "unified_kv_both", "unified_nixl_both", "unified_v2"]
per_worker = {p: _load(f"per_worker_{p}.json") for p in pols}
workers = sorted(per_worker["unified"]["per_worker_ttft_p90_s"].keys())
x = range(len(workers))
width = 0.27
fig, ax = plt.subplots(figsize=(11, 5))
n = len(pols)
width = 0.85 / n
fig, ax = plt.subplots(figsize=(12, 5))
for i, p in enumerate(pols):
d = per_worker[p]["per_worker_ttft_p90_s"]
vals = [d[w] for w in workers]
offset = (i - 1) * width
offset = (i - (n - 1) / 2) * width
label = p.replace("unified_", "") if p != "unified" else "plain"
ax.bar([j + offset for j in x], vals, width,
label=f"{p} (hotspot={per_worker[p]['hotspot_index_ttft_p90']:.2f})",
label=f"{label} (hotspot={per_worker[p]['hotspot_index_ttft_p90']:.2f})",
color=POLICY_COLORS[p], edgecolor="black", linewidth=0.4)
short = [w.replace("http://127.0.0.1:", ":") for w in workers]
ax.set_xticks(list(x))
ax.set_xticklabels(short, rotation=0, fontsize=9)
ax.set_ylabel("worker TTFT p90 (s)")
ax.set_title(
"Per-worker TTFT p90 distribution. kv_both alone makes the hot worker hotter\n"
"(unified→kv_both: 37.7s→43.5s peak); v2's 5 PD-sep triggers nudge it back."
"Per-worker TTFT p90 distribution across substrates. Mooncake (kv_both)\n"
"amplifies the hot worker (hotspot 4.36); NIXL keeps it close to plain (3.67)."
)
ax.legend(loc="upper left", fontsize=9)
ax.grid(alpha=0.3, axis="y")
@@ -232,12 +236,64 @@ def fig_three_way_hotspot():
plt.close(fig)
def fig_connector_substrate_attribution():
"""Decomposes overhead into v1-framework cost (shared by all connectors,
proxied by NIXL since it's the leanest) and Mooncake-specific cost."""
comp = _load("b3_policy_comparison.json")
by = {r["policy"]: r for r in comp["rows"]}
metrics = [
("TTFT p90 (s)", "ttft_p90_s", False),
("TPOT p90 (ms)", "tpot_p90_s", True),
("E2E p90 (s)", "e2e_p90_s", False),
("hotspot index", "hotspot_index_ttft_p90", False),
]
fig, axes = plt.subplots(1, 4, figsize=(15, 4))
for ax, (label, key, scale_ms) in zip(axes, metrics):
plain = by["unified"][key] * (1000 if scale_ms else 1)
nixl = by["unified_nixl_both"][key] * (1000 if scale_ms else 1)
moon = by["unified_kv_both"][key] * (1000 if scale_ms else 1)
v2 = by["unified_v2"][key] * (1000 if scale_ms else 1)
framework_cost = nixl - plain # what NIXL adds = v1 framework cost
mooncake_extra = moon - nixl # extra on top from Mooncake
v2_branch_extra = v2 - moon # extra from PD-sep branch (Mooncake + 5 events)
bottom = 0
ax.bar(["overhead"], [plain], color="#cccccc",
edgecolor="black", linewidth=0.4,
label=f"plain unified ({plain:.2f})")
bottom += plain
ax.bar(["overhead"], [framework_cost], bottom=[bottom],
color="#1f77b4", edgecolor="black", linewidth=0.4,
label=f"v1 framework (+{framework_cost:.2f})")
bottom += framework_cost
ax.bar(["overhead"], [mooncake_extra], bottom=[bottom],
color="#9467bd", edgecolor="black", linewidth=0.4,
label=f"Mooncake extra (+{mooncake_extra:.2f})")
bottom += mooncake_extra
ax.bar(["overhead"], [v2_branch_extra], bottom=[bottom],
color="#d62728", edgecolor="black", linewidth=0.4,
label=f"v2 PD-sep branch ({v2_branch_extra:+.2f})")
ax.set_title(label)
ax.legend(fontsize=8, loc="upper right")
ax.grid(alpha=0.3, axis="y")
ax.tick_params(axis="x", labelbottom=False)
fig.suptitle(
"Attribution: plain unified vs NIXL substrate vs Mooncake substrate vs v2.\n"
"Blue: cost shared by any v1 connector. Purple: cost specific to Mooncake."
)
fig.tight_layout()
fig.savefig(OUT / "fig_connector_substrate_attribution.png", dpi=120)
plt.close(fig)
def main():
fig_kv_both_overhead()
fig_v2_trigger_funnel()
fig_v2_predicted_vs_actual()
fig_three_way_hotspot()
print(f"wrote 4 figures to {OUT}")
fig_connector_substrate_attribution()
print(f"wrote 5 figures to {OUT}")
if __name__ == "__main__":