agentic-kvc

Go to file

Gahow Wang e23128ad65 B2: PD-colo interference microbench harness + sweep aggregator

scripts/b2_interference.py is the controlled microbench. It runs two
coroutines against the open proxy bypass (direct vLLM endpoints):

- decode_load: continuous short-prompt requests at fixed QPS into a
  designated decode instance, to keep it decode-saturated.
- prefill_injections: N large one-token requests at fixed interval,
  pointed at either the same instance (same-worker variant) or a
  paired one (different-worker control).

Each cell (variant × prefill_size) gets its own metrics.jsonl plus a
run_window.json containing t_start_unix/t_end_unix. The shared
engine_*.jsonl from the scheduler patch is sliced by that window in
the aggregator.

analysis/characterization/b2_sweep_analysis.py walks the cell tree,
slices the per-worker step log by each cell's window, runs the A5
interference_index() against the slice, and emits a single
b2_sweep_summary.json with one row per cell. This is what feeds the
"interference vs uncached prefill size" figure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 17:54:51 +08:00