Refresh the standing audit package now that B1' / B2 / B3 are complete. current_results/characterization_claim_matrix.md Flips seven entries from "not_yet_supported" / "partially_supported" to "supported" with pointers into window_1_results/. New entries cover per-session sequentiality, KV per request, real reuse decomposition, theoretical APC ceiling, the LMetric locality gap, Unified breaking the locality-vs-latency tradeoff, B2 causal interference proof, sticky's interference inflation, and the partial heavy-tail / hot-spot story. B4 SRR + B5 attribution stay "not_yet_supported" (Window 2 work). current_results/main_claim_allowed_runs.md New "Allowed For Routing-Policy Comparison" section pins the five B3 policy directories. New "Allowed For PD-colo Interference" section pins the B2 sweep. Legacy section retained for the pre-instrumentation 200/500/1000-req runs. current_results/reviewer_risk_register.md Marks the two old "high"-severity risks (sequentiality / reuse decomposition) as resolved; adds new entries for the APC contamination empirics, the b3_analyze.sh truncate-write bug that cost unified's interference index, the GPU-0 EngineCore ghost cleanup, the saturated-replay caveat for trace-timestamp dispatch, and the synthetic B2 decode workload. current_results/all_figures_index.md Adds the 8 new Window 1 figures alongside the existing 6 from the legacy summarize_runs run. current_results/reproduction_commands.sh Records the full B3 + B2 + figure pipeline. analysis/characterization_todo_for_interns.md Updates the Progress Snapshot table: B0, B1, B2, B3, B6 all DONE; only B4 and B5 remain (Window 2). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.2 KiB
Main-Claim Allowed Runs
Status: post-Window-1 audit gate Date: 2026-05-25
Allowed For Workload-Shape Claims
-
dash0:/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl- Compact formatted full trace (2.11M requests / 1.31M sessions).
- CPU summary in
current_results/full_trace_summary.jsonand Window 1 KV footprint inwindow_1_results/kv_footprint_summary.json. - Supports: long-input / short-output / heavy-tail token mass / KV per request distribution.
-
traces/w600_r0.0015_st30.jsonl- 1214 requests / 274 sessions / 53.3 M tokens.
- APC theoretical bounds in
window_1_results/apc_upper_w600.json. - Routing-policy comparison trace used by B3.
Allowed For Routing-Policy Comparison Claims
These five runs share an identical trace, model, and 8-instance topology; they support all per-policy claims about APC, hotspot, interference, latency, failure breakdown.
outputs/b3_sweep_20260525_095043/lmetric/— main baselineoutputs/b3_sweep_20260525_095043/load_only/— control: no cache / no affinityoutputs/b3_sweep_20260525_095043/sticky/— control: hard affinityoutputs/b3_sweep_20260525_095043/unified/— hybrid (interference index unavailable; see note in claim matrix)outputs/b3_sweep_20260525_095043/capped/— lmetric on cap-8 trace
Aggregated comparison: outputs/b3_sweep_20260525_095043/b3_policy_comparison.json.
Rendered figures: analysis/characterization/window_1_results/figures/fig_b3_*.png.
Allowed For PD-colo Interference Causal Claims
outputs/b2_microbench/sweep/{same,different}/p{2048,8192,16384,32768,65536}/- Decode-load + prefill-injection microbench.
b2_sweep_summary.jsonaggregates per-cell TPOT and TTFT (overlap vs clean), indexed byprefill_size × variant.- Different-worker control idx ≈ 1.0 across 32× variation; same-worker idx scales monotonically.
Allowed For Legacy Baseline Sanity Claims
These older runs predate Window 1 instrumentation. They can still support "static PD-sep was worse than combined on this fixed-request workload" type claims, but not the new SRR or per-policy comparisons.
outputs/gpu_ab_combined,outputs/gpu_ab_pdsepoutputs/contention_16s_ts10,outputs/contention_16s_elasticoutputs/combined_1000req,outputs/exp3_pd_sep_tp1_mooncake
NOT Allowed For Main Claims
The following need new runs:
- B4 SRR sweep: arrival-rate sweep with open-loop Poisson session arrivals and per-class SLO. No data yet.
- B5 failure attribution near SRR boundary: depends on B4.
- Production interference under cache_aware proxy: B2 used direct endpoints; the production routing might shift the same-worker collision profile.
Required Upgrade Path
For Window 2 (B4 + B5), the existing stack already meets the needs:
- A1 unix timestamps on every metric row ✓
- A2 worker_state snapshots ✓
- A3 step-level engine_state (works in isolated runs since
df32499) ✓ - A4 open-loop Poisson loadgen ✓
- A5 joined_analysis + failure labels ✓
No new instrumentation required. The only software gap is b3_analyze.sh
must use per-policy engine_state when present (fixed at commit df32499).