Adds analysis/pd_sep_paper_section/ as the home for the "PD separation is net negative under agentic workloads" paper section: plot scripts for C1 (workload chars), C6 (roofline), C7 (routing-vs-PD-sep lever), the C6/C7 PDFs already rendered, and a README mapping candidate claims to required figures plus open re-run items. Removes --enforce-eager from bench.sh and all active launch scripts so cuda graphs are captured -- the prior methodology suppressed one of PD-sep's structural advantages (D-node fixed-shape decode). Legacy scripts under scripts/legacy/ are intentionally untouched as historical records. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.2 KiB
Paper section: PD separation under agentic workloads
This directory collects everything produced for the "PD-sep is net negative on agentic workloads" paper section. It is one section of a larger paper, not the whole paper.
Layout
analysis/pd_sep_paper_section/
├── README.md # this file
├── scripts/
│ ├── plot_workload.py # C1: input/output CDF + KV reuse decomposition
│ ├── plot_roofline.py # C6: prefill roofline at varying cache reuse
│ └── plot_routing_lever.py # C7: routing vs PD-sep as design levers
└── figures/
├── fig_c6_roofline.pdf # rendered locally (analytical, no trace needed)
├── fig_c7_routing_lever.pdf # rendered locally (from REPORT.md §3.1)
└── (fig_c1a_io_cdf.pdf, # produced on dash0 when trace is available
fig_c1b_reuse.pdf)
Candidate claims -> figures (status)
| Claim | Figure | Status |
|---|---|---|
| C1: 98% prefill share + 91% intra-session KV reuse | figures/fig_c1a_io_cdf.pdf, figures/fig_c1b_reuse.pdf |
needs trace on dash0 |
| C2: PD-sep vs Combined headline numbers | (not yet) | needs re-run without --enforce-eager on traces/w600_r0.0015_st30.jsonl |
| C3: decode KV cache memory wall (time-series) | (not yet) | needs step-level vLLM telemetry during PD-sep run |
| C4: TTFT stacked breakdown (prefill / KV pull / decode wait) | (not yet) | needs per-request breakdown.json from PD-sep run |
| C5: cuda-graph ablation (eager vs cudagraph × Combined vs PD-sep) | (not yet) | needs the 2×2 matrix |
| C6: prefill stays compute-bound at 95% reuse | figures/fig_c6_roofline.pdf |
rendered |
| C7: cache-aware routing is a larger lever than PD-sep | figures/fig_c7_routing_lever.pdf |
rendered (legacy data, footer caveat) |
In-place edits made for this task
These edits are in the repo, not in this directory, because they modify
existing launch scripts. --enforce-eager was removed so cuda graphs can be
captured — PD-sep's D-node is a particularly clean case for cuda-graph
benefit and the prior methodology suppressed it.
| File | Lines | Change |
|---|---|---|
scripts/bench.sh |
150, 161 | drop --enforce-eager (elastic + baseline modes) |
scripts/launch_pd_mooncake.sh |
47, 64 | drop --enforce-eager (P and D instances) |
scripts/launch_pd_separated.sh |
52, 68 | drop --enforce-eager (P and D instances) |
scripts/launch_phase1_ps.sh |
32, 43 | drop --enforce-eager (C and PS instances) |
scripts/launch_elastic_p2p.sh |
57 | drop --enforce-eager (kv_both instances) |
scripts/legacy/*.sh are intentionally left as-is — they record the
configuration of past experiments.
REPORT.md and analysis/pd_separation_analysis.md still describe the
old --enforce-eager setup. Update them once the new runs land.
Reproducing the figures
From repo root:
# C1 (needs sampled trace on dash0)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_workload.py \
--trace traces/w600_r0.0015_st30.jsonl
# C6 (analytical, runs anywhere with matplotlib)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_roofline.py
# C7 (hardcoded REPORT.md §3.1 numbers; no inputs)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_routing_lever.py
All three default --outdir to analysis/pd_sep_paper_section/figures.
Caveats / open items
- C7 uses legacy data. The footer of
fig_c7_routing_lever.pdfsays so: PD-sep numbers come from the random-sampled trace +--enforce-eager. Re-run ontraces/w600_r0.0015_st30.jsonlwith cuda-graphs on before paper-grade citation. The plotting code keeps the source numbers in a singleROWStable (top ofplot_routing_lever.py) for a one-line swap. - C2/C3/C4/C5 figures are not produced because the experiments have not been re-run. The 4h matrix proposed in the prior conversation turn (Combined + RR, Combined + cache-aware, PD-sep 4P+4D, PD-sep 6P+2D, plus eager-vs-cudagraph ablation, ×3 seeds) is the prerequisite.
- C6 is analytical, so it is independent of any re-run. The numbers
match
scripts/compute_roofline.py(constants are duplicated; if one changes, the other must change too).