Files
agentic-kvc/scripts
Gahow Wang f739f7d461 Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy
scripts/b3_isolated_policy.sh:
  Recognize unified_v3 as a kv_both-requiring policy; respect explicit
  KV_CONNECTOR=Nixl override (so unified_v2 / unified_v3 / unified_kv_both
  can run against either Mooncake or Nixl back-end). When Nixl is
  selected, skip the bootstrap-ports plumbing — Nixl uses its own UCX
  side-channel and the proxy forwards kv_transfer_params from the src
  response body instead of pre-baking engine_id/bootstrap_addr.

scripts/cache_aware_proxy.py:
  - New unified_v3 policy (~250 lines): prefill stays on session-affinity
    host (preserves intra-session prefix-cache reuse), decode is migrated
    to a lower-load target when the affinity host is busy with concurrent
    decodes. KV transfer flows prefill_host → decode_target, opposite of
    v2. Knobs: v3_min_new_tokens, v3_min_prefill_decode_busy,
    v3_target_load_ratio, v3_min_load_gap, v3_rotate_affinity,
    v3_prefer_cache_target. cache_miss_audit found rotation hurts cross-
    turn locality (9.5% hit with vs ~80% without) so default
    v3_rotate_affinity=False.
  - New connector_type setting ("mooncake" | "nixl") gating the PD-sep
    handshake form: mooncake uses pre-baked kv_transfer_params,
    nixl forwards them from the response body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:05:19 +08:00
..