Mooncake C++ batch_transfer_sync defaults to 30s timeout; on
saturated D scheduler threads doing LRU eviction, that fires as a
false positive and the SGLang hair-trigger in conn.py:1270
permanently blacklists the D's mooncake_session_id (E2 forensic in
docs/E1_E2_RESULTS_ZH.md §5c). Bump to 1800s in setup_env.sh and
mirror to subprocess env in stack.py so SGLang workers get it too.
30-min envelope still detects genuinely broken peers eventually.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
setup_env.sh: source-able shell snippet that points tvm_ffi (vendor
sglang JIT compiler) at \$HOME/cuda-12.8/bin/nvcc and exposes both
libcudart.so.12 (for mooncake.engine, a cu12 wheel) and cu12.8 lib64
(for tvm_ffi compile-time linker) on LD_LIBRARY_PATH. Without this,
JIT-compiled kernels NEEDED libcudart.so.13 and driver 570 rejected
them at every JIT call.
convert_inferact_to_trace.py: turns Inferact codex_swebenchpro_traces
(ShareGPT {"from","value"} pairs) into the chat_id/parent_chat_id/
turn/hash_ids JSONL schema replay.py expects. Tokenizes with the
model's own tokenizer, builds prefix-sharing 24-token block hashes,
synthesizes timestamps. Output cross-checks 20,230 LLM calls — exactly
matches the Inferact README count for 610 successful trials.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>