agentic-kvc/scripts at 4242bba034ac200c728e6294eaf32f00899a4bc5 - agentic-kvc - Local Gitea

gahow/agentic-kvc

Files

History

Gahow Wang f739f7d461 Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy

scripts/b3_isolated_policy.sh:
  Recognize unified_v3 as a kv_both-requiring policy; respect explicit
  KV_CONNECTOR=Nixl override (so unified_v2 / unified_v3 / unified_kv_both
  can run against either Mooncake or Nixl back-end). When Nixl is
  selected, skip the bootstrap-ports plumbing — Nixl uses its own UCX
  side-channel and the proxy forwards kv_transfer_params from the src
  response body instead of pre-baking engine_id/bootstrap_addr.

scripts/cache_aware_proxy.py:
  - New unified_v3 policy (~250 lines): prefill stays on session-affinity
    host (preserves intra-session prefix-cache reuse), decode is migrated
    to a lower-load target when the affinity host is busy with concurrent
    decodes. KV transfer flows prefill_host → decode_target, opposite of
    v2. Knobs: v3_min_new_tokens, v3_min_prefill_decode_busy,
    v3_target_load_ratio, v3_min_load_gap, v3_rotate_affinity,
    v3_prefer_cache_target. cache_miss_audit found rotation hurts cross-
    turn locality (9.5% hit with vs ~80% without) so default
    v3_rotate_affinity=False.
  - New connector_type setting ("mooncake" | "nixl") gating the PD-sep
    handshake form: mooncake uses pre-baked kv_transfer_params,
    nixl forwards them from the response body.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-27 22:05:19 +08:00

..

scripts: archive obsolete one-off shell/python scripts to legacy/ (D2, D3)

2026-05-23 20:57:32 +08:00

analyze_agentic_patterns.py

Balanced session-sticky routing + agentic workload pattern analysis

2026-05-22 01:50:27 +08:00

analyze_breakdown.py

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

analyze_cache_hit.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

analyze_eviction.py

KV cache lifecycle design + eviction loss analysis

2026-05-22 01:27:22 +08:00

analyze_trace.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

b2_interference.py

B2 interference driver: request return_token_ids + text fallback

2026-05-25 22:39:54 +08:00

b3_analyze.sh

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

b3_isolated_policy.sh

Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy

2026-05-27 22:05:19 +08:00

b3_sweep.sh

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00

bench.sh

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

build_capped_trace.py

B3: load_only + sticky policies, capped-trace builder, sweep driver

2026-05-25 17:54:24 +08:00

cache_aware_proxy.py

Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy

2026-05-27 22:05:19 +08:00

compare_results.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

compute_apc_upper_bound.py

Window 1 analysis: APC upper bound, B2 window-overlap, figure renderer

2026-05-25 23:24:54 +08:00

compute_inter_turn_gap_chatbot.py

Add chatbot T_external CDF; overlay on f3a vs agentic

2026-05-27 14:49:44 +08:00

compute_inter_turn_gap_remote.py

Measure inter-turn T_external on the raw production trace; add f3a CDF

2026-05-27 12:37:32 +08:00

compute_roofline.py

compute_roofline: argparse --trace, fix stale default path (D4)

2026-05-23 20:58:09 +08:00

deploy_vllm_patches.sh

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

gpu_monitor.sh

Add GPU utilization A/B test and fix cache-aware proxy bugs

2026-05-21 22:13:38 +08:00

launch_elastic_p2p.sh

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00

launch_pd_mooncake.sh

Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

2026-05-25 11:24:16 +08:00

launch_pd_separated.sh

Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

2026-05-25 11:24:16 +08:00

launch_phase1_ps.sh

Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

2026-05-25 11:24:16 +08:00

launch_vllm.sh

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

plot_inter_turn_gap.py

Add chatbot T_external CDF; overlay on f3a vs agentic

2026-05-27 14:49:44 +08:00

plot_session_skew_cdf.py

Add solo production-trace CDF figure (f2b_session_skew_prod.png)

2026-05-27 10:53:30 +08:00

render_b3_figures_v2.py

Render 4 per-policy figures on b3_replay_20260527_0114 into figs/v2/

2026-05-27 13:52:17 +08:00

render_b3_report.py

B3 report renderer: incremental markdown table from comparison JSON

2026-05-25 18:58:21 +08:00

sample_trace.py

Production-realistic baseline: APC 67.5%, TPOT +139% from interference

2026-05-23 15:44:34 +08:00

simulate_cache_policies.py

Cache policy simulation: routing quality dominates, not eviction policy

2026-05-22 01:28:53 +08:00

slice_engine_state.py

B3 post-run helpers: engine_state slicer + per-policy aggregator

2026-05-25 18:51:33 +08:00

test_direct_read.py

Fix hash mismatch: token-based lookup instead of cross-instance hash matching

2026-05-24 01:14:33 +08:00