1cf03c6e790a7490a643759a1f013fcd2d0c4ea1
Old cost model: offload_cost = colocated_cost + RDMA_overhead, so offload was always 0.1s more expensive. Result: only 19/117 HEAVY offloaded. New: colocated_cost includes interference penalty when C_s has decode requests: penalty = prefill_time × min(num_requests, 3) × 0.3. Offload now wins when C_s has 1+ concurrent request. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%