Add hash_table_sync logging + gap analysis

Root cause of 0 cache hits on offloaded requests identified:
- Hash table sync IS working (scheduler→metadata→worker→bootstrap)
- But D's query_blocks returns no matches → hash format mismatch
  between D's request.block_hashes and C's synced hashes

The gap: offloaded TTFT (12.4s) ≈ co-located TTFT (12.0s) because
D does FULL cold prefill (cache_hit=0), not partial prefill with
RDMA-read cached blocks.

Next: debug hash format mismatch between D and C.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 00:38:14 +08:00
parent 1cf03c6e79
commit a1f30e5fce

View File

@@ -422,6 +422,12 @@ class MooncakeConnectorScheduler:
get_block_hash(k).hex() for k in removed_keys
}
self._known_hash_keys = current_keys.copy()
logger.info("hash_table_sync: +%d -%d (total known=%d)",
len(new_keys), len(removed_keys), len(self._known_hash_keys))
else:
if not hasattr(self, '_bp_warned'):
logger.warning("_block_pool is None, hash table sync disabled")
self._bp_warned = True
if not self.is_kv_consumer:
for req_id, (req, block_ids) in self._reqs_need_send.items():