Files
agentic-kvc/analysis/pd_sep_paper_section/README.md
Gahow Wang d71a111099 Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts
Adds analysis/pd_sep_paper_section/ as the home for the "PD separation is
net negative under agentic workloads" paper section: plot scripts for C1
(workload chars), C6 (roofline), C7 (routing-vs-PD-sep lever), the C6/C7
PDFs already rendered, and a README mapping candidate claims to required
figures plus open re-run items.

Removes --enforce-eager from bench.sh and all active launch scripts so
cuda graphs are captured -- the prior methodology suppressed one of
PD-sep's structural advantages (D-node fixed-shape decode). Legacy
scripts under scripts/legacy/ are intentionally untouched as historical
records.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 11:24:16 +08:00

4.2 KiB
Raw Blame History

Paper section: PD separation under agentic workloads

This directory collects everything produced for the "PD-sep is net negative on agentic workloads" paper section. It is one section of a larger paper, not the whole paper.

Layout

analysis/pd_sep_paper_section/
├── README.md                       # this file
├── scripts/
│   ├── plot_workload.py            # C1: input/output CDF + KV reuse decomposition
│   ├── plot_roofline.py            # C6: prefill roofline at varying cache reuse
│   └── plot_routing_lever.py       # C7: routing vs PD-sep as design levers
└── figures/
    ├── fig_c6_roofline.pdf         # rendered locally (analytical, no trace needed)
    ├── fig_c7_routing_lever.pdf    # rendered locally (from REPORT.md §3.1)
    └── (fig_c1a_io_cdf.pdf,        # produced on dash0 when trace is available
          fig_c1b_reuse.pdf)

Candidate claims -> figures (status)

Claim Figure Status
C1: 98% prefill share + 91% intra-session KV reuse figures/fig_c1a_io_cdf.pdf, figures/fig_c1b_reuse.pdf needs trace on dash0
C2: PD-sep vs Combined headline numbers (not yet) needs re-run without --enforce-eager on traces/w600_r0.0015_st30.jsonl
C3: decode KV cache memory wall (time-series) (not yet) needs step-level vLLM telemetry during PD-sep run
C4: TTFT stacked breakdown (prefill / KV pull / decode wait) (not yet) needs per-request breakdown.json from PD-sep run
C5: cuda-graph ablation (eager vs cudagraph × Combined vs PD-sep) (not yet) needs the 2×2 matrix
C6: prefill stays compute-bound at 95% reuse figures/fig_c6_roofline.pdf rendered
C7: cache-aware routing is a larger lever than PD-sep figures/fig_c7_routing_lever.pdf rendered (legacy data, footer caveat)

In-place edits made for this task

These edits are in the repo, not in this directory, because they modify existing launch scripts. --enforce-eager was removed so cuda graphs can be captured — PD-sep's D-node is a particularly clean case for cuda-graph benefit and the prior methodology suppressed it.

File Lines Change
scripts/bench.sh 150, 161 drop --enforce-eager (elastic + baseline modes)
scripts/launch_pd_mooncake.sh 47, 64 drop --enforce-eager (P and D instances)
scripts/launch_pd_separated.sh 52, 68 drop --enforce-eager (P and D instances)
scripts/launch_phase1_ps.sh 32, 43 drop --enforce-eager (C and PS instances)
scripts/launch_elastic_p2p.sh 57 drop --enforce-eager (kv_both instances)

scripts/legacy/*.sh are intentionally left as-is — they record the configuration of past experiments.

REPORT.md and analysis/pd_separation_analysis.md still describe the old --enforce-eager setup. Update them once the new runs land.

Reproducing the figures

From repo root:

# C1 (needs sampled trace on dash0)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_workload.py \
    --trace traces/w600_r0.0015_st30.jsonl

# C6 (analytical, runs anywhere with matplotlib)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_roofline.py

# C7 (hardcoded REPORT.md §3.1 numbers; no inputs)
.venv/bin/python analysis/pd_sep_paper_section/scripts/plot_routing_lever.py

All three default --outdir to analysis/pd_sep_paper_section/figures.

Caveats / open items

  • C7 uses legacy data. The footer of fig_c7_routing_lever.pdf says so: PD-sep numbers come from the random-sampled trace + --enforce-eager. Re-run on traces/w600_r0.0015_st30.jsonl with cuda-graphs on before paper-grade citation. The plotting code keeps the source numbers in a single ROWS table (top of plot_routing_lever.py) for a one-line swap.
  • C2/C3/C4/C5 figures are not produced because the experiments have not been re-run. The 4h matrix proposed in the prior conversation turn (Combined + RR, Combined + cache-aware, PD-sep 4P+4D, PD-sep 6P+2D, plus eager-vs-cudagraph ablation, ×3 seeds) is the prerequisite.
  • C6 is analytical, so it is independent of any re-run. The numbers match scripts/compute_roofline.py (constants are duplicated; if one changes, the other must change too).