scripts/b3_isolated_policy.sh:
Recognize unified_v3 as a kv_both-requiring policy; respect explicit
KV_CONNECTOR=Nixl override (so unified_v2 / unified_v3 / unified_kv_both
can run against either Mooncake or Nixl back-end). When Nixl is
selected, skip the bootstrap-ports plumbing — Nixl uses its own UCX
side-channel and the proxy forwards kv_transfer_params from the src
response body instead of pre-baking engine_id/bootstrap_addr.
scripts/cache_aware_proxy.py:
- New unified_v3 policy (~250 lines): prefill stays on session-affinity
host (preserves intra-session prefix-cache reuse), decode is migrated
to a lower-load target when the affinity host is busy with concurrent
decodes. KV transfer flows prefill_host → decode_target, opposite of
v2. Knobs: v3_min_new_tokens, v3_min_prefill_decode_busy,
v3_target_load_ratio, v3_min_load_gap, v3_rotate_affinity,
v3_prefer_cache_target. cache_miss_audit found rotation hurts cross-
turn locality (9.5% hit with vs ~80% without) so default
v3_rotate_affinity=False.
- New connector_type setting ("mooncake" | "nixl") gating the PD-sep
handshake form: mooncake uses pre-baked kv_transfer_params,
nixl forwards them from the response body.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>