agentic-kvc

Go to file

Gahow Wang 151bf33541 Add unified_nixl_both policy: NIXL connector isolation control

Adds a NIXL-backed counterpart to unified_kv_both so we can attribute
the kv_both substrate overhead measured in the elastic_migration_v2
section to either Mooncake-specific code or a generic v1-connector
cost shared by all connectors.

- scripts/cache_aware_proxy.py: register --policy unified_nixl_both.
  Picker is identical to unified (and unified_kv_both); routing
  decisions never go through the PD-sep branch. Differs only at the
  vLLM launch layer.
- scripts/b3_isolated_policy.sh: new KV_CONNECTOR env var
  (Mooncake|Nixl), auto-set based on POLICY. NIXL launch path uses
  --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
  with no VLLM_MOONCAKE_BOOTSTRAP_PORT (NIXL uses UCX side-channels).
- Health-check timeout: 90 iterations * 2s -> 180 iterations * 2s
  (180s -> 360s). Empirically NIXL needs ~100-150s per instance to
  initialize the UCX agent and register KV cache memory; 8
  concurrent NIXL launches frequently overshoot the previous 180s
  budget. Mooncake is unaffected (still finishes well inside the new
  budget). The 8-vLLM unified_nixl_both first launch tripped the
  old timeout despite 7/8 instances reaching startup-complete.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-26 14:57:54 +08:00