agentic-kvc

Files

Gahow Wang b7902061d1 Window 1 analysis: APC upper bound, B2 window-overlap, figure renderer

Three CPU-only analysis pieces that turn raw Window 1 artifacts into
publishable numbers and figures.

scripts/compute_apc_upper_bound.py
  Block-level trie walk over hash_ids to compute the theoretical APC
  ceiling on a trace, decomposed into intra-session / any-session /
  shared-prefix-only. Gives a fixed reference for what each routing
  policy could *possibly* achieve. w600 result: 79.6% intra-session,
  80.3% any-session, 0.1% shared-prefix.

analysis/characterization/b2_sweep_analysis.py (rewrite)
  Previous version used joined_analysis.interference_index() which
  labeled overlap = "any prefill in any other request during this
  decode". With short-prompt decode load this is always true
  (everyone's prefill overlaps everyone else's decode); n_overlap
  was 239/240 even in the different-worker control.

  New version labels overlap iff the decode's [t_first_token, t_finish]
  intersects an actual large *injection* window, computed from the
  cell's "prefill"-tagged metric rows. Different-worker control now
  cleanly sits at idx ≈ 1.0, same-worker scales monotonically.

analysis/characterization/render_window1_figures.py
  Renders 8 PNGs from the result JSONs: B3 latency / APC vs ceiling
  / APC vs hotspot scatter / per-worker TTFT / failure breakdown,
  B2 TPOT and TTFT curves (overlap vs clean and idx), reuse
  decomposition, KV footprint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 23:24:54 +08:00

characterization

Window 1 analysis: APC upper bound, B2 window-overlap, figure renderer

2026-05-25 23:24:54 +08:00

pd_sep_paper_section

PD-sep matrix results: C2/C3/C4 figures + empirical mechanism refined

2026-05-25 16:23:52 +08:00

adaptive_prefill_offload_design.md

Design doc: Adaptive Prefill Offload

2026-05-22 00:44:22 +08:00

characterization_todo_for_interns.md

Characterization plan: progress snapshot + Claude work plan