f5e45afd4ee6dc2532ce82e2b48927d6a96ef85b
Bug 1+5: D instance had no accounting during prefill phase (7-11s window). Router saw D as idle, routing extra traffic that caused KV allocation failures. Fix: reserve D's ongoing_tokens+num_requests at offload decision time. Bug 7: No cap on concurrent offloads despite REPORT claiming MAX_OFFLOAD=4. Fix: add MAX_OFFLOAD_INFLIGHT=4 check before offloading. Bug 6: Session affinity migrated to D but proxy cache estimator wasn't updated for D. Future turns scored D as cache-cold. Fix: call d_inst.record_prefix(token_ids) after successful decode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%