Bug 1+5: D instance had no accounting during prefill phase (7-11s window).
Router saw D as idle, routing extra traffic that caused KV allocation failures.
Fix: reserve D's ongoing_tokens+num_requests at offload decision time.
Bug 7: No cap on concurrent offloads despite REPORT claiming MAX_OFFLOAD=4.
Fix: add MAX_OFFLOAD_INFLIGHT=4 check before offloading.
Bug 6: Session affinity migrated to D but proxy cache estimator wasn't
updated for D. Future turns scored D as cache-cold.
Fix: call d_inst.record_prefix(token_ids) after successful decode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>