agentic-kvc/scripts at d9046322c66597d9d926480d3350c192c7392d1c - agentic-kvc - Local Gitea

gahow/agentic-kvc

Files

History

Gahow Wang d9046322c6 Add parameter-free LPWL routing policy (--policy leastwork)

Least-Prefill-Work-Left: score = pending_prefill_tokens + max(0, input -
cache_hit_here), pure argmin with (num_requests, round-robin) tie-break.
Zero hyperparameters — derived from the agentic pattern: decode is cheap
(I/O ~217x) so outstanding prefill-token-work is the only load worth
modelling. Dropping LMetric's x num_requests factor (a) un-swallows the
cache signal so affinity emerges with no gate, and (b) makes an idle-but-
decoding host score `input` (its true marginal cost) instead of 0,
removing the empty-batch degeneracy. Stick-vs-spill crossover is computed
from real token-work, replacing overload_factor + cache_ratio gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-29 16:08:10 +08:00

..

scripts: archive obsolete one-off shell/python scripts to legacy/ (D2, D3)

2026-05-23 20:57:32 +08:00

analyze_agentic_patterns.py

Balanced session-sticky routing + agentic workload pattern analysis

2026-05-22 01:50:27 +08:00

analyze_breakdown.py

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

analyze_cache_hit.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

analyze_eviction.py

KV cache lifecycle design + eviction loss analysis

2026-05-22 01:27:22 +08:00

analyze_trace.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

b2_interference.py

B2 interference driver: request return_token_ids + text fallback

2026-05-25 22:39:54 +08:00

b3_analyze.sh

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

b3_isolated_policy.sh

Proxy/runner support for Nixl connector + unified_v3 (offload-decode) policy

2026-05-27 22:05:19 +08:00

b3_sweep.sh

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00

bench.sh

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

build_capped_trace.py

B3: load_only + sticky policies, capped-trace builder, sweep driver

2026-05-25 17:54:24 +08:00

cache_aware_proxy.py

Add parameter-free LPWL routing policy (--policy leastwork)

2026-05-29 16:08:10 +08:00

compare_results.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

compute_apc_upper_bound.py

Window 1 analysis: APC upper bound, B2 window-overlap, figure renderer

2026-05-25 23:24:54 +08:00

compute_inter_turn_gap_chatbot.py

Add chatbot T_external CDF; overlay on f3a vs agentic

2026-05-27 14:49:44 +08:00

compute_inter_turn_gap_remote.py

Measure inter-turn T_external on the raw production trace; add f3a CDF

2026-05-27 12:37:32 +08:00

compute_roofline.py

compute_roofline: argparse --trace, fix stale default path (D4)

2026-05-23 20:58:09 +08:00

deploy_vllm_patches.sh

Fix review bugs: PD-sep counter leaks, hardcoded paths, missing deps

2026-05-26 15:54:55 +08:00

gpu_monitor.sh

Add GPU utilization A/B test and fix cache-aware proxy bugs

2026-05-21 22:13:38 +08:00

launch_elastic_p2p.sh

Fix review P2s: lockfile, model path convention, trap robustness

2026-05-26 16:05:43 +08:00

launch_pd_mooncake.sh

Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

2026-05-25 11:24:16 +08:00

launch_pd_separated.sh

Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

2026-05-25 11:24:16 +08:00

launch_phase1_ps.sh

Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

2026-05-25 11:24:16 +08:00

launch_vllm.sh

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

plot_inter_turn_gap.py

Add chatbot T_external CDF; overlay on f3a vs agentic

2026-05-27 14:49:44 +08:00

plot_session_skew_cdf.py

Add solo production-trace CDF figure (f2b_session_skew_prod.png)

2026-05-27 10:53:30 +08:00

render_b3_figures_v2.py

Render 4 per-policy figures on b3_replay_20260527_0114 into figs/v2/

2026-05-27 13:52:17 +08:00

render_b3_report.py

B3 report renderer: incremental markdown table from comparison JSON

2026-05-25 18:58:21 +08:00

sample_trace.py

Production-realistic baseline: APC 67.5%, TPOT +139% from interference

2026-05-23 15:44:34 +08:00

simulate_cache_policies.py

Cache policy simulation: routing quality dominates, not eviction policy

2026-05-22 01:28:53 +08:00

slice_engine_state.py

B3 post-run helpers: engine_state slicer + per-policy aggregator

2026-05-25 18:51:33 +08:00

test_direct_read.py

Fix hash mismatch: token-based lookup instead of cross-instance hash matching

2026-05-24 01:14:33 +08:00

working_set_analysis.py

Working-set figure: extend left panel to ~50 nodes

2026-05-28 17:11:12 +08:00