Files
agentic-kvc/microbench/connector_tax/tools/noop_connector.py
Gahow Wang 297fed6e73 Microbench 3 (connector_tax): infrastructure for KV connector substrate tax
Validates the elastic_migration_v2 finding that kv_role=kv_both adds
TTFT p90 +45% even when PD-sep never fires. Replicates under
single-instance, synthetic, open-loop workload to disambiguate
mechanism cost from 8-instance feedback amplification.

Configurations (8):
  plain, noop_connector, mooncake_{producer,consumer,both},
  nixl_both, lmcache_only, multi_mooncake_lmcache.

Pre-flight verification gates risky configs (kv_consumer needs dummy
bootstrap, multi-connector composition, NoOp custom class loading).

Workload: two-phase sweep
  Phase A: rate {0.5..32} req/s × shape (4096, 256), saturation criteria
  Phase B: ref_safe rate × cartesian (input ∈ {512,4k,32k}, output ∈ {64,256,1024})

Step-timing patch enriches vLLM's existing AGENTIC_STEP_LOG_PATH emit
with step_duration_us and build_meta_us — directly measures per-step
substrate cost, not just user-visible TTFT/TPOT.

run_all.sh runs as 5-stage barrier:
  0 pre-flight + apply patch
  1 Phase A all configs
  2 pick ref_safe / ref_load
  3 Phase B all configs
  4 revert patch + analyze + plot

Outputs aggregate.{json,csv}, MANIFEST.tsv, and 5 figures.
Estimated runtime: 4-5.5 hours on idle dash0 H20.
2026-05-26 17:27:41 +08:00

91 lines
2.5 KiB
Python

"""Pure no-op KV connector for measuring vLLM v1 framework overhead.
This connector implements every abstract hook of KVConnectorBase_V1 with
the cheapest possible no-op return. Loaded via:
--kv-transfer-config '{
"kv_connector_module_path":
"microbench.connector_tax.tools.noop_connector:NoOpConnector",
"kv_role": "kv_both"
}'
It does:
- no I/O
- no per-step cache key walk
- no per-layer save/load
- no metadata serialization beyond an empty dataclass
So `tax(NoOpConnector) ≈ pure vLLM v1 framework overhead`.
"""
from typing import TYPE_CHECKING, Any
from vllm.distributed.kv_transfer.kv_connector.v1.base import (
KVConnectorBase_V1,
KVConnectorMetadata,
)
if TYPE_CHECKING:
import torch
from vllm.attention.backends.abstract import AttentionMetadata
from vllm.forward_context import ForwardContext
from vllm.v1.core.kv_cache_manager import KVCacheBlocks
from vllm.v1.core.sched.output import SchedulerOutput
from vllm.v1.request import Request
class NoOpConnector(KVConnectorBase_V1):
"""Empty connector — every hook is a no-op.
Used as a control to isolate vLLM v1 framework dispatch cost
(build_connector_meta walking SchedulerOutput, mixin hooks, etc.)
from any specific connector implementation work (RDMA setup,
per-layer save, hash table walks).
"""
# ---- scheduler-side abstract methods ------------------------------
def get_num_new_matched_tokens(
self,
request: "Request",
num_computed_tokens: int,
) -> tuple[int | None, bool]:
# Never advertises any external cache hits.
return 0, False
def update_state_after_alloc(
self,
request: "Request",
blocks: "KVCacheBlocks",
num_external_tokens: int,
) -> None:
return None
def build_connector_meta(
self,
scheduler_output: "SchedulerOutput",
) -> KVConnectorMetadata:
return KVConnectorMetadata()
# ---- worker-side abstract methods ---------------------------------
def start_load_kv(
self,
forward_context: "ForwardContext",
**kwargs: Any,
) -> None:
return None
def wait_for_layer_load(self, layer_name: str) -> None:
return None
def save_kv_layer(
self,
layer_name: str,
kv_layer: "torch.Tensor",
attn_metadata: "AttentionMetadata",
**kwargs: Any,
) -> None:
return None
def wait_for_save(self) -> None:
return None