agentic-pd-hybrid

gahow/agentic-pd-hybrid

Fork 0

Commit Graph

Author	SHA1	Message	Date
tim	905d671135	feat(env): MC_TRANSFER_TIMEOUT=1800s default in setup_env + stack Mooncake C++ batch_transfer_sync defaults to 30s timeout; on saturated D scheduler threads doing LRU eviction, that fires as a false positive and the SGLang hair-trigger in conn.py:1270 permanently blacklists the D's mooncake_session_id (E2 forensic in docs/E1_E2_RESULTS_ZH.md §5c). Bump to 1800s in setup_env.sh and mirror to subprocess env in stack.py so SGLang workers get it too. 30-min envelope still detects genuinely broken peers eventually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:45:09 +08:00
tim	b55371fe69	docs: H200 + driver 570 setup guide + 11 lessons learned Captures the full debugging journey of getting vendored SGLang 0.5.10 + mooncake RDMA running on a 4×H200 node with the older driver 570.86.15. Driver 570's actual API is cu12.8 — nvidia-smi's "CUDA Version: 13.0" header is a forward-compat ceiling, not the driver's own version — and that single misreading drove most of the detours. Lessons cover: pip vs vendor sglang divergence, why cu13 switching was a dead end (mooncake is cu12-only by wheel, driver 570 can't run cu13 anyway), why --disable-overlap-schedule alone isn't enough, why pip nvidia-cuda-nvcc-cu12 doesn't ship the nvcc binary, and how tvm_ffi's ninja-driven nvcc invocation makes CUDA_HOME the single hook point that fixes everything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 00:10:14 +08:00

Author

SHA1

Message

Date

tim

905d671135

feat(env): MC_TRANSFER_TIMEOUT=1800s default in setup_env + stack

Mooncake C++ batch_transfer_sync defaults to 30s timeout; on
saturated D scheduler threads doing LRU eviction, that fires as a
false positive and the SGLang hair-trigger in conn.py:1270
permanently blacklists the D's mooncake_session_id (E2 forensic in
docs/E1_E2_RESULTS_ZH.md §5c). Bump to 1800s in setup_env.sh and
mirror to subprocess env in stack.py so SGLang workers get it too.
30-min envelope still detects genuinely broken peers eventually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 11:45:09 +08:00

tim

b55371fe69

docs: H200 + driver 570 setup guide + 11 lessons learned

Captures the full debugging journey of getting vendored SGLang 0.5.10
+ mooncake RDMA running on a 4×H200 node with the older driver
570.86.15. Driver 570's actual API is cu12.8 — nvidia-smi's
"CUDA Version: 13.0" header is a forward-compat ceiling, not the
driver's own version — and that single misreading drove most of the
detours. Lessons cover: pip vs vendor sglang divergence, why cu13
switching was a dead end (mooncake is cu12-only by wheel, driver 570
can't run cu13 anyway), why --disable-overlap-schedule alone isn't
enough, why pip nvidia-cuda-nvcc-cu12 doesn't ship the nvcc binary,
and how tvm_ffi's ninja-driven nvcc invocation makes CUDA_HOME the
single hook point that fixes everything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 00:10:14 +08:00

2 Commits