Gahow Wang 1cf03c6e79 Cost model: add interference penalty for co-located heavy prefill
Old cost model: offload_cost = colocated_cost + RDMA_overhead, so offload
was always 0.1s more expensive. Result: only 19/117 HEAVY offloaded.

New: colocated_cost includes interference penalty when C_s has decode
requests: penalty = prefill_time × min(num_requests, 3) × 0.3.
Offload now wins when C_s has 1+ concurrent request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 23:59:06 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%