Gahow Wang 6716a3401a Progress: hash matching FIXED (640/640), RDMA read returns -1
Hash mismatch root cause: sha256_cbor vs sha256 (default) + NONE_HASH
from-import value binding. Both fixed. Now 640/640 blocks matched.

RDMA read (batch_transfer_sync_read) fails with ret=-1.
Likely cause: Mooncake TransferEngine may not support RDMA READ
to arbitrary registered memory without explicit permission setup.
The PUSH path (batch_transfer_sync_write) works because the sender
initiates, but PULL may need additional RDMA MR access flags.

Next: investigate Mooncake's RDMA read permission model, or
fall back to a two-step approach: D sends query → C responds
with blocks via batch_transfer_sync_write (existing PUSH path),
but triggered by the bootstrap server instead of the scheduler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-24 01:40:52 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%