a1f30e5fce4ad7197f9e7fa2717ea9b0d516389c
Root cause of 0 cache hits on offloaded requests identified: - Hash table sync IS working (scheduler→metadata→worker→bootstrap) - But D's query_blocks returns no matches → hash format mismatch between D's request.block_hashes and C's synced hashes The gap: offloaded TTFT (12.4s) ≈ co-located TTFT (12.0s) because D does FULL cold prefill (cache_hit=0), not partial prefill with RDMA-read cached blocks. Next: debug hash format mismatch between D and C. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
82.9%
Shell
17.1%