Gahow Wang
fc92410ec9
Invalidate prior A/B results + add proper experiment harness
Prior cross-machine comparison (commit 1e86285) was invalid: dash0
baseline used warm instances with residual KV cache, inflating TTFT
by 2x. Evidence: inst_7 APC=68.3% impossible from 25 cold-start
requests; WARM TTFT p90=3.3s vs fresh=0.26s.
Fair same-machine comparison (both fresh restart on dash0):
Baseline: TTFT50=1.075 TPOT90=0.076 E2E50=5.075 OK=198/200
Elastic P2P: TTFT50=1.018 TPOT90=0.085 E2E50=6.977 OK=195/200
Elastic is WORSE due to Mooncake kv_both memory overhead.
Changes:
- REPORT.md: rewrite §3-4 with corrected results, add §3.5 errata
- pd_separation_analysis.md: update elastic TL;DR with correct numbers
- cache_aware_proxy.py: fix double-decrement bugs in offload path,
add 120s prefill timeout with co-located fallback (HEAVY_COLO_FALLBACK)
- bench.sh: standardized experiment harness with guaranteed GPU cleanup
and fresh-state verification (nvidia-smi check before start)
- run_elastic_stability_test.sh: two-phase elastic vs baseline test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 17:54:21 +08:00
..
2026-05-21 22:13:38 +08:00
2026-05-21 22:42:20 +08:00
2026-05-21 23:02:42 +08:00
2026-05-22 01:50:27 +08:00
2026-05-22 12:28:24 +08:00
2026-05-22 00:13:50 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 01:27:22 +08:00
2026-05-21 22:13:38 +08:00
2026-05-22 13:25:34 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 17:54:21 +08:00
2026-05-22 17:54:21 +08:00
2026-05-22 15:48:51 +08:00
2026-05-22 01:00:10 +08:00
2026-05-22 12:28:24 +08:00
2026-05-22 02:13:15 +08:00
2026-05-22 15:08:16 +08:00
2026-05-22 12:28:24 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 10:35:18 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 03:14:05 +08:00
2026-05-21 22:13:38 +08:00
2026-05-22 16:17:41 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 15:48:51 +08:00
2026-05-22 00:13:50 +08:00
2026-05-22 10:58:59 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 17:54:21 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 16:57:32 +08:00
2026-05-21 21:21:57 +08:00
2026-05-22 01:28:53 +08:00