Files

Gahow Wang d71a111099 Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

Adds analysis/pd_sep_paper_section/ as the home for the "PD separation is
net negative under agentic workloads" paper section: plot scripts for C1
(workload chars), C6 (roofline), C7 (routing-vs-PD-sep lever), the C6/C7
PDFs already rendered, and a README mapping candidate claims to required
figures plus open re-run items.

Removes --enforce-eager from bench.sh and all active launch scripts so
cuda graphs are captured -- the prior methodology suppressed one of
PD-sep's structural advantages (D-node fixed-shape decode). Legacy
scripts under scripts/legacy/ are intentionally untouched as historical
records.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 11:24:16 +08:00

4.2 KiB

Raw Blame History

Paper section: PD separation under agentic workloads

This directory collects everything produced for the "PD-sep is net negative on agentic workloads" paper section. It is one section of a larger paper, not the whole paper.

Layout

analysis/pd_sep_paper_section/
├── README.md                       # this file
├── scripts/
│   ├── plot_workload.py            # C1: input/output CDF + KV reuse decomposition
│   ├── plot_roofline.py            # C6: prefill roofline at varying cache reuse
│   └── plot_routing_lever.py       # C7: routing vs PD-sep as design levers
└── figures/
    ├── fig_c6_roofline.pdf         # rendered locally (analytical, no trace needed)
    ├── fig_c7_routing_lever.pdf    # rendered locally (from REPORT.md §3.1)
    └── (fig_c1a_io_cdf.pdf,        # produced on dash0 when trace is available
          fig_c1b_reuse.pdf)

Candidate claims -> figures (status)

Claim	Figure	Status
C1: 98% prefill share + 91% intra-session KV reuse	`figures/fig_c1a_io_cdf.pdf`, `figures/fig_c1b_reuse.pdf`	needs trace on dash0
C2: PD-sep vs Combined headline numbers	(not yet)	needs re-run without --enforce-eager on `traces/w600_r0.0015_st30.jsonl`
C3: decode KV cache memory wall (time-series)	(not yet)	needs step-level vLLM telemetry during PD-sep run
C4: TTFT stacked breakdown (prefill / KV pull / decode wait)	(not yet)	needs per-request breakdown.json from PD-sep run
C5: cuda-graph ablation (eager vs cudagraph × Combined vs PD-sep)	(not yet)	needs the 2×2 matrix
C6: prefill stays compute-bound at 95% reuse	`figures/fig_c6_roofline.pdf`	rendered
C7: cache-aware routing is a larger lever than PD-sep	`figures/fig_c7_routing_lever.pdf`	rendered (legacy data, footer caveat)

In-place edits made for this task

These edits are in the repo, not in this directory, because they modify existing launch scripts. --enforce-eager was removed so cuda graphs can be captured — PD-sep's D-node is a particularly clean case for cuda-graph benefit and the prior methodology suppressed it.

File	Lines	Change
`scripts/bench.sh`	150, 161	drop `--enforce-eager` (elastic + baseline modes)
`scripts/launch_pd_mooncake.sh`	47, 64	drop `--enforce-eager` (P and D instances)
`scripts/launch_pd_separated.sh`	52, 68	drop `--enforce-eager` (P and D instances)
`scripts/launch_phase1_ps.sh`	32, 43	drop `--enforce-eager` (C and PS instances)
`scripts/launch_elastic_p2p.sh`	57	drop `--enforce-eager` (kv_both instances)

scripts/legacy/*.sh are intentionally left as-is — they record the configuration of past experiments.

REPORT.md and analysis/pd_separation_analysis.md still describe the old --enforce-eager setup. Update them once the new runs land.

Reproducing the figures

From repo root:

# C1 (needs sampled trace on dash0)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_workload.py \
    --trace traces/w600_r0.0015_st30.jsonl

# C6 (analytical, runs anywhere with matplotlib)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_roofline.py

# C7 (hardcoded REPORT.md §3.1 numbers; no inputs)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_routing_lever.py

All three default --outdir to analysis/pd_sep_paper_section/figures.

Caveats / open items

C7 uses legacy data. The footer of fig_c7_routing_lever.pdf says so: PD-sep numbers come from the random-sampled trace + --enforce-eager. Re-run on traces/w600_r0.0015_st30.jsonl with cuda-graphs on before paper-grade citation. The plotting code keeps the source numbers in a single ROWS table (top of plot_routing_lever.py) for a one-line swap.
C2/C3/C4/C5 figures are not produced because the experiments have not been re-run. The 4h matrix proposed in the prior conversation turn (Combined + RR, Combined + cache-aware, PD-sep 4P+4D, PD-sep 6P+2D, plus eager-vs-cudagraph ablation, ×3 seeds) is the prerequisite.
C6 is analytical, so it is independent of any re-run. The numbers match scripts/compute_roofline.py (constants are duplicated; if one changes, the other must change too).

4.2 KiB Raw Blame History Unescape Escape