Adds dated, non-destructive correction notes to the contaminated PD-vs-colo
artifacts after the producer-eviction bug (`evict_blocks(sent_block_ids)` on
`finished_sending`, deployed over the "fresh" pip vLLM by
`scripts/deploy_vllm_patches.sh`) was found and gated behind
`VLLM_EVICT_SENT_BLOCKS` (default off).
PD_DISAGG_RESULTS.md top CORRECTION banner + §6 RETRACTED marker.
§6 (session-affinity hot-pin) was an `e13391e`
artifact under controlled concurrency; §3 RR, §4
TPOT win, §5 D-pool ceiling, §5.1 consumer crash
stand.
RESULTS_SUMMARY.md §4 confirm+refine note: clean ablation confirms
the D-pool capacity thesis and adds regime-
dependence.
pd_separation_analysis.md scoped caution: thesis confirmed; flags
only reuse-dependent figures for cross-check
(this study used a different stack).
figs/mb5/CORRECTION.md flags mb5_producer_hotspot.png as retracted;
§3 RR and §5 D-pool figures stand.
1.7 KiB
1.7 KiB
⚠️ Correction notice for figs/mb5/ (2026-05-30)
These figures back microbench/fresh_setup/PD_DISAGG_RESULTS.md. A producer-side
contamination was later found in the stack that produced them: commit e13391e
(deployed over the "fresh" pip vLLM by scripts/deploy_vllm_patches.sh) evicts a
producer's prefix-cache blocks on every KV transfer, so a disaggregated producer
could never keep a session's prefix warm. It is now gated behind
VLLM_EVICT_SENT_BLOCKS (default off) and everything was re-run clean.
| figure | section | status |
|---|---|---|
mb5_producer_hotspot.png |
§6.3 session-affinity hot-pinning | 🛑 RETRACTED — pure e13391e artifact. On the clean stack, session-routed producers reach APC parity with colo (71–82%); there is no 0%-util stall / hot-pin pathology. |
mb5_latency_compare.png |
§3 round-robin headline | ✅ stands — RR's ~0% prefix-hit is a routing artifact (consecutive turns → different producers), not the eviction bug; reproduced clean. |
mb5_kv_timeline.png, mb5_role_split.png, mb5_peak_utilization.png |
§5 per-role KV pool occupancy | ✅ D-pool capacity-ceiling mechanism stands (decode pegs while prefill strands). P-pool occupancy may read slightly low under eviction; the qualitative split is unaffected. |
mb5_summary.csv |
aggregate | mixed — §3/§5 rows valid; any session-affinity rows superseded. |
Superseded by the corrected three-axis ablation: ../mb5_pd_ablation/
(reuse / shape / concurrency), data in ../../analysis/mb5_pd_ablation/.
Raw §6 data analysis/mb5/session_prod.json is contaminated; analysis/mb5/rr_prod.json (round-robin) stands.