Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts

Adds analysis/pd_sep_paper_section/ as the home for the "PD separation is
net negative under agentic workloads" paper section: plot scripts for C1
(workload chars), C6 (roofline), C7 (routing-vs-PD-sep lever), the C6/C7
PDFs already rendered, and a README mapping candidate claims to required
figures plus open re-run items.

Removes --enforce-eager from bench.sh and all active launch scripts so
cuda graphs are captured -- the prior methodology suppressed one of
PD-sep's structural advantages (D-node fixed-shape decode). Legacy
scripts under scripts/legacy/ are intentionally untouched as historical
records.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 11:24:16 +08:00
parent 6a27f75337
commit d71a111099
11 changed files with 576 additions and 9 deletions

View File

@@ -49,7 +49,6 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 $VLLM serve "$MODEL_PATH" \
--tensor-parallel-size 4 \
--trust-remote-code \
--enable-prefix-caching \
--enforce-eager \
--dtype auto \
--gpu-memory-utilization 0.9 \
--kv-transfer-config \
@@ -65,7 +64,6 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 $VLLM serve "$MODEL_PATH" \
--tensor-parallel-size 4 \
--trust-remote-code \
--enable-prefix-caching \
--enforce-eager \
--dtype auto \
--gpu-memory-utilization 0.8 \
--kv-transfer-config \