Codifies the methodology fixes for every weakness called
out in AUDIT_AND_ROADMAP_ZH §3.1. Existing sweep reports
(KVCACHE_CENTRIC_PROGRESS_ZH, V2_RESULTS_ZH) violate at
least one of these; future runs must use this protocol.
Contents:
- §1.1 M1 — N≥3 + bootstrap CI; no N=1 in headline
- §1.2 M2 — paired-on-same-trial-mask; same trace /
timeout / max_input_len / time_scale; errors
and aborts each get their own column
- §1.3 M3 — required stratification dimensions
(turn_id / append_len / overlap_ratio /
inter_turn_gap / input_len)
- §1.4 M4 — minimum 2 baselines from a 6-item list,
including at least one non-SGLang baseline
- §1.5 M5 — trace mix: Ali full + SWE-Bench +
ShareGPT + synthetic adversarial
- §1.6 M6 — hardware tiers; single-node 4xH200 +
dual-node NVLink/IB as minimum
- §2 report templates (main table, paired delta,
stratified, negative-result section)
- §3 tool support: marks the two scripts that the
follow-up commits on this branch add
- §4 SOSP/OSDI artifact requirements
- §5 pre-submission self-checklist
- §6 phased delivery plan for catching up to protocol
No code change; reading dependency for the analyzer
scripts that follow.