Three CPU-only analysis pieces that turn raw Window 1 artifacts into
publishable numbers and figures.
scripts/compute_apc_upper_bound.py
Block-level trie walk over hash_ids to compute the theoretical APC
ceiling on a trace, decomposed into intra-session / any-session /
shared-prefix-only. Gives a fixed reference for what each routing
policy could *possibly* achieve. w600 result: 79.6% intra-session,
80.3% any-session, 0.1% shared-prefix.
analysis/characterization/b2_sweep_analysis.py (rewrite)
Previous version used joined_analysis.interference_index() which
labeled overlap = "any prefill in any other request during this
decode". With short-prompt decode load this is always true
(everyone's prefill overlaps everyone else's decode); n_overlap
was 239/240 even in the different-worker control.
New version labels overlap iff the decode's [t_first_token, t_finish]
intersects an actual large *injection* window, computed from the
cell's "prefill"-tagged metric rows. Different-worker control now
cleanly sits at idx ≈ 1.0, same-worker scales monotonically.
analysis/characterization/render_window1_figures.py
Renders 8 PNGs from the result JSONs: B3 latency / APC vs ceiling
/ APC vs hotspot scatter / per-worker TTFT / failure breakdown,
B2 TPOT and TTFT curves (overlap vs clean and idx), reuse
decomposition, KV footprint.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>