Files
agentic-kvc/analysis/characterization/current_results/main_claim_allowed_runs.md
Gahow Wang 4722883903 Audit package refresh: Window 1 supported claims + risk register
Refresh the standing audit package now that B1' / B2 / B3 are complete.

current_results/characterization_claim_matrix.md
  Flips seven entries from "not_yet_supported" / "partially_supported"
  to "supported" with pointers into window_1_results/. New entries
  cover per-session sequentiality, KV per request, real reuse
  decomposition, theoretical APC ceiling, the LMetric locality gap,
  Unified breaking the locality-vs-latency tradeoff, B2 causal
  interference proof, sticky's interference inflation, and the
  partial heavy-tail / hot-spot story. B4 SRR + B5 attribution stay
  "not_yet_supported" (Window 2 work).

current_results/main_claim_allowed_runs.md
  New "Allowed For Routing-Policy Comparison" section pins the five
  B3 policy directories. New "Allowed For PD-colo Interference"
  section pins the B2 sweep. Legacy section retained for the
  pre-instrumentation 200/500/1000-req runs.

current_results/reviewer_risk_register.md
  Marks the two old "high"-severity risks (sequentiality / reuse
  decomposition) as resolved; adds new entries for the APC
  contamination empirics, the b3_analyze.sh truncate-write bug that
  cost unified's interference index, the GPU-0 EngineCore ghost
  cleanup, the saturated-replay caveat for trace-timestamp dispatch,
  and the synthetic B2 decode workload.

current_results/all_figures_index.md
  Adds the 8 new Window 1 figures alongside the existing 6 from the
  legacy summarize_runs run.

current_results/reproduction_commands.sh
  Records the full B3 + B2 + figure pipeline.

analysis/characterization_todo_for_interns.md
  Updates the Progress Snapshot table: B0, B1, B2, B3, B6 all DONE;
  only B4 and B5 remain (Window 2).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 23:25:27 +08:00

3.2 KiB
Raw Blame History

Main-Claim Allowed Runs

Status: post-Window-1 audit gate Date: 2026-05-25

Allowed For Workload-Shape Claims

  • dash0:/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl

    • Compact formatted full trace (2.11M requests / 1.31M sessions).
    • CPU summary in current_results/full_trace_summary.json and Window 1 KV footprint in window_1_results/kv_footprint_summary.json.
    • Supports: long-input / short-output / heavy-tail token mass / KV per request distribution.
  • traces/w600_r0.0015_st30.jsonl

    • 1214 requests / 274 sessions / 53.3 M tokens.
    • APC theoretical bounds in window_1_results/apc_upper_w600.json.
    • Routing-policy comparison trace used by B3.

Allowed For Routing-Policy Comparison Claims

These five runs share an identical trace, model, and 8-instance topology; they support all per-policy claims about APC, hotspot, interference, latency, failure breakdown.

  • outputs/b3_sweep_20260525_095043/lmetric/ — main baseline
  • outputs/b3_sweep_20260525_095043/load_only/ — control: no cache / no affinity
  • outputs/b3_sweep_20260525_095043/sticky/ — control: hard affinity
  • outputs/b3_sweep_20260525_095043/unified/ — hybrid (interference index unavailable; see note in claim matrix)
  • outputs/b3_sweep_20260525_095043/capped/ — lmetric on cap-8 trace

Aggregated comparison: outputs/b3_sweep_20260525_095043/b3_policy_comparison.json. Rendered figures: analysis/characterization/window_1_results/figures/fig_b3_*.png.

Allowed For PD-colo Interference Causal Claims

  • outputs/b2_microbench/sweep/{same,different}/p{2048,8192,16384,32768,65536}/
    • Decode-load + prefill-injection microbench.
    • b2_sweep_summary.json aggregates per-cell TPOT and TTFT (overlap vs clean), indexed by prefill_size × variant.
    • Different-worker control idx ≈ 1.0 across 32× variation; same-worker idx scales monotonically.

Allowed For Legacy Baseline Sanity Claims

These older runs predate Window 1 instrumentation. They can still support "static PD-sep was worse than combined on this fixed-request workload" type claims, but not the new SRR or per-policy comparisons.

  • outputs/gpu_ab_combined, outputs/gpu_ab_pdsep
  • outputs/contention_16s_ts10, outputs/contention_16s_elastic
  • outputs/combined_1000req, outputs/exp3_pd_sep_tp1_mooncake

NOT Allowed For Main Claims

The following need new runs:

  • B4 SRR sweep: arrival-rate sweep with open-loop Poisson session arrivals and per-class SLO. No data yet.
  • B5 failure attribution near SRR boundary: depends on B4.
  • Production interference under cache_aware proxy: B2 used direct endpoints; the production routing might shift the same-worker collision profile.

Required Upgrade Path

For Window 2 (B4 + B5), the existing stack already meets the needs:

  • A1 unix timestamps on every metric row ✓
  • A2 worker_state snapshots ✓
  • A3 step-level engine_state (works in isolated runs since df32499) ✓
  • A4 open-loop Poisson loadgen ✓
  • A5 joined_analysis + failure labels ✓

No new instrumentation required. The only software gap is b3_analyze.sh must use per-policy engine_state when present (fixed at commit df32499).