Mooncake C++ batch_transfer_sync defaults to 30s timeout; on
saturated D scheduler threads doing LRU eviction, that fires as a
false positive and the SGLang hair-trigger in conn.py:1270
permanently blacklists the D's mooncake_session_id (E2 forensic in
docs/E1_E2_RESULTS_ZH.md §5c). Bump to 1800s in setup_env.sh and
mirror to subprocess env in stack.py so SGLang workers get it too.
30-min envelope still detects genuinely broken peers eventually.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the full debugging journey of getting vendored SGLang 0.5.10
+ mooncake RDMA running on a 4×H200 node with the older driver
570.86.15. Driver 570's actual API is cu12.8 — nvidia-smi's
"CUDA Version: 13.0" header is a forward-compat ceiling, not the
driver's own version — and that single misreading drove most of the
detours. Lessons cover: pip vs vendor sglang divergence, why cu13
switching was a dead end (mooncake is cu12-only by wheel, driver 570
can't run cu13 anyway), why --disable-overlap-schedule alone isn't
enough, why pip nvidia-cuda-nvcc-cu12 doesn't ship the nvcc binary,
and how tvm_ffi's ninja-driven nvcc invocation makes CUDA_HOME the
single hook point that fixes everything.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>