1. push_cost now models both C and D: max(c_cost, d_cost) where
c_cost includes C's queue + prefill, d_cost includes D's queue +
RDMA overhead. Old formula only had D's contention + RDMA.
2. Hard gate uses num_requests instead of ongoing_tokens, aligning
with the contention-based cost model.
3. Fix migration_discount: min(cap, 5) instead of hardcoded min(cap, 3).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>