151bf335414d2f7d4d345c6914527c3e84db5c8c
Adds a NIXL-backed counterpart to unified_kv_both so we can attribute
the kv_both substrate overhead measured in the elastic_migration_v2
section to either Mooncake-specific code or a generic v1-connector
cost shared by all connectors.
- scripts/cache_aware_proxy.py: register --policy unified_nixl_both.
Picker is identical to unified (and unified_kv_both); routing
decisions never go through the PD-sep branch. Differs only at the
vLLM launch layer.
- scripts/b3_isolated_policy.sh: new KV_CONNECTOR env var
(Mooncake|Nixl), auto-set based on POLICY. NIXL launch path uses
--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
with no VLLM_MOONCAKE_BOOTSTRAP_PORT (NIXL uses UCX side-channels).
- Health-check timeout: 90 iterations * 2s -> 180 iterations * 2s
(180s -> 360s). Empirically NIXL needs ~100-150s per instance to
initialize the UCX agent and register KV cache memory; 8
concurrent NIXL launches frequently overshoot the previous 180s
budget. Mooncake is unaffected (still finishes well inside the new
budget). The 8-vLLM unified_nixl_both first launch tripped the
old timeout despite 7/8 instances reaching startup-complete.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%