Gahow Wang 4c583f2f1c Revert relaxed gate + push_cost fix: 134 offloads destroyed performance
PD-sep offload overhead (C queue + prefill + KV transfer + D schedule)
far exceeds any load balance benefit. With relaxed gate, cost model
triggered 134 offloads → E2E p90 went from 37s to 82s.

The proven winning configuration is Unified routing in baseline mode
(no Mooncake connector), which beats LMetric on E2E mean/p50/p90
purely through better routing (contention-aware + session affinity).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-25 03:38:59 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%