1 Commits

Author SHA1 Message Date
06dd175441 Microbench 1 plots: prefill-decode interference heatmap + lines
plot_interference.py reads the interference sweep summary (4 D × 4 P × 3 reps,
cold prefill prompts) and produces:

  fig_interference_heatmap.png
    TPOT p90 interference index over (D, P): 14x at D=8 P=2k → 214x at D=1 P=32k.

  fig_interference_lines.png
    (a) TPOT p90 during prefill vs P, log-y, one line per D + baseline dashed
    (b) Cold prefill TTFT vs P (interference window length)

Confirms B2 finding: cold prefill on the same worker stalls overlapping
decodes for 14-214x baseline TPOT. The interference window grows linearly
with P (from ~140ms at 2k to ~4.6s at 32k) and is essentially independent
of decode batch size — prefill compute time dominates.
2026-05-26 14:21:30 +08:00