Files
agentic-kvc/scripts
Gahow Wang 21ffb3d4f7 PD-sep matrix infrastructure: bench.sh pdsep mode + matrix driver
Adds the experiment harness that gates the empirical claims (C2/C3/C4/C5)
in the PD-sep paper section. Three pieces:

  1. scripts/bench.sh: new --mode pdsep with --pd-ratio P:D, and an
     --eager flag to re-enable --enforce-eager for the cuda-graph
     ablation. pdsep reuses the elastic-mode Mooncake kv_both launch and
     swaps the proxy command from --combined to --prefill/--decode.
     baseline and elastic flows are unchanged.

  2. analysis/pd_sep_paper_section/scripts/bench_pd_matrix.sh: matrix
     driver that runs {combined-ca, pdsep-4p4d, pdsep-6p2d} x cudagraph
     x 3 seeds by default (~2 h on dash0). --with-rr adds combined-rr;
     --with-eager doubles to ~5 h with the cuda-graph ablation. Skips
     completed runs, captures per-instance vLLM logs (needed for C3
     step-level KV-utilization mining).

  3. fig_kv_memory_wall.pdf: empirical anchor (star) at REPORT.md §3.3's
     observed 6P+2D 97% KV utilization. The marker lands on the model's
     predicted curve at p90 input, confirming the steady-state analysis.

README updated with the run command, output layout, and the followup
plotters that consume outputs/pd_matrix/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 11:47:33 +08:00
..