scripts/b2_interference.py is the controlled microbench. It runs two
coroutines against the open proxy bypass (direct vLLM endpoints):
- decode_load: continuous short-prompt requests at fixed QPS into a
designated decode instance, to keep it decode-saturated.
- prefill_injections: N large one-token requests at fixed interval,
pointed at either the same instance (same-worker variant) or a
paired one (different-worker control).
Each cell (variant × prefill_size) gets its own metrics.jsonl plus a
run_window.json containing t_start_unix/t_end_unix. The shared
engine_*.jsonl from the scheduler patch is sliced by that window in
the aggregator.
analysis/characterization/b2_sweep_analysis.py walks the cell tree,
slices the per-worker step log by each cell's window, runs the A5
interference_index() against the slice, and emits a single
b2_sweep_summary.json with one row per cell. This is what feeds the
"interference vs uncached prefill size" figure.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>