daeb95eca08efd009360d5516032dbd1eadbbbb7
retry on ConnectError to handle kv_both connection instability With RDMA_overhead=0.1s, offload triggers when C_s has just 700 tokens pending (0.1s queue), vs 38k tokens (5.4s) with the old 2.0s estimate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%