21ffb3d4f77956d008b1815a3c0d46e0188ac390
Adds the experiment harness that gates the empirical claims (C2/C3/C4/C5)
in the PD-sep paper section. Three pieces:
1. scripts/bench.sh: new --mode pdsep with --pd-ratio P:D, and an
--eager flag to re-enable --enforce-eager for the cuda-graph
ablation. pdsep reuses the elastic-mode Mooncake kv_both launch and
swaps the proxy command from --combined to --prefill/--decode.
baseline and elastic flows are unchanged.
2. analysis/pd_sep_paper_section/scripts/bench_pd_matrix.sh: matrix
driver that runs {combined-ca, pdsep-4p4d, pdsep-6p2d} x cudagraph
x 3 seeds by default (~2 h on dash0). --with-rr adds combined-rr;
--with-eager doubles to ~5 h with the cuda-graph ablation. Skips
completed runs, captures per-instance vLLM logs (needed for C3
step-level KV-utilization mining).
3. fig_kv_memory_wall.pdf: empirical anchor (star) at REPORT.md §3.3's
observed 6P+2D 97% KV utilization. The marker lands on the model's
predicted curve at p90 input, confirming the steady-state analysis.
README updated with the run command, output layout, and the followup
plotters that consume outputs/pd_matrix/.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%