agentic-kvc/scripts at bf4469a150c63f4a1fa0102eabda98c3e7ce1981 - agentic-kvc - Local Gitea

gahow/agentic-kvc

Files

History

Gahow Wang bf4469a150 Fix cost model: accurate push_cost + aligned hard gate

1. push_cost now models both C and D: max(c_cost, d_cost) where
   c_cost includes C's queue + prefill, d_cost includes D's queue +
   RDMA overhead. Old formula only had D's contention + RDMA.
2. Hard gate uses num_requests instead of ongoing_tokens, aligning
   with the contention-based cost model.
3. Fix migration_discount: min(cap, 5) instead of hardcoded min(cap, 3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-25 01:01:03 +08:00

..

scripts: archive obsolete one-off shell/python scripts to legacy/ (D2, D3)

2026-05-23 20:57:32 +08:00

analyze_agentic_patterns.py

Balanced session-sticky routing + agentic workload pattern analysis

2026-05-22 01:50:27 +08:00

analyze_breakdown.py

Add per-request breakdown profiling, identify KV cache memory bottleneck

2026-05-22 00:13:50 +08:00

analyze_cache_hit.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

analyze_eviction.py

KV cache lifecycle design + eviction loss analysis

2026-05-22 01:27:22 +08:00

analyze_trace.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

bench.sh

Fix multi-turn replay fidelity: track realized output tokens across all components

2026-05-24 14:47:51 +08:00

cache_aware_proxy.py

Fix cost model: accurate push_cost + aligned hard gate

2026-05-25 01:01:03 +08:00

compare_results.py

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

compute_roofline.py

compute_roofline: argparse --trace, fix stale default path (D4)

2026-05-23 20:58:09 +08:00

deploy_vllm_patches.sh

Add deploy_vllm_patches.sh: sync third_party/vllm patches to site-packages

2026-05-24 11:59:52 +08:00

gpu_monitor.sh

Add GPU utilization A/B test and fix cache-aware proxy bugs

2026-05-21 22:13:38 +08:00

launch_elastic_p2p.sh

Fix multi-turn replay fidelity: track realized output tokens across all components

2026-05-24 14:47:51 +08:00

launch_pd_mooncake.sh

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

launch_pd_separated.sh

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

launch_phase1_ps.sh

launch_phase1_ps: parameterise project + model paths (B6 followup)

2026-05-23 21:14:15 +08:00

launch_vllm.sh

Agentic workload PD separation analysis with trace-driven benchmarks

2026-05-21 21:21:57 +08:00

sample_trace.py

Production-realistic baseline: APC 67.5%, TPOT +139% from interference

2026-05-23 15:44:34 +08:00

simulate_cache_policies.py

Cache policy simulation: routing quality dominates, not eviction policy

2026-05-22 01:28:53 +08:00

test_direct_read.py

Fix hash mismatch: token-based lookup instead of cross-instance hash matching

2026-05-24 01:14:33 +08:00