06dd17544183167c8250c8b25433a74df41b65b8
plot_interference.py reads the interference sweep summary (4 D × 4 P × 3 reps,
cold prefill prompts) and produces:
fig_interference_heatmap.png
TPOT p90 interference index over (D, P): 14x at D=8 P=2k → 214x at D=1 P=32k.
fig_interference_lines.png
(a) TPOT p90 during prefill vs P, log-y, one line per D + baseline dashed
(b) Cold prefill TTFT vs P (interference window length)
Confirms B2 finding: cold prefill on the same worker stalls overlapping
decodes for 14-214x baseline TPOT. The interference window grows linearly
with P (from ~140ms at 2k to ~4.6s at 32k) and is essentially independent
of decode batch size — prefill compute time dominates.
Description
No description provided
Languages
Python
82.9%
Shell
17.1%