Paper section: PD-sep scaffold + drop --enforce-eager from launch scripts
Adds analysis/pd_sep_paper_section/ as the home for the "PD separation is net negative under agentic workloads" paper section: plot scripts for C1 (workload chars), C6 (roofline), C7 (routing-vs-PD-sep lever), the C6/C7 PDFs already rendered, and a README mapping candidate claims to required figures plus open re-run items. Removes --enforce-eager from bench.sh and all active launch scripts so cuda graphs are captured -- the prior methodology suppressed one of PD-sep's structural advantages (D-node fixed-shape decode). Legacy scripts under scripts/legacy/ are intentionally untouched as historical records. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -54,7 +54,7 @@ for i in $(seq 0 $((N_INSTANCES - 1))); do
|
||||
$VLLM serve "$MODEL" \
|
||||
--host 0.0.0.0 --port $port \
|
||||
--tensor-parallel-size 1 \
|
||||
--trust-remote-code --enable-prefix-caching --enforce-eager \
|
||||
--trust-remote-code --enable-prefix-caching \
|
||||
--dtype auto --gpu-memory-utilization 0.9 --max-model-len 200000 \
|
||||
--kv-transfer-config \
|
||||
'{"kv_connector":"MooncakeConnector","kv_role":"kv_both"}' \
|
||||
|
||||
Reference in New Issue
Block a user