agentic-pd-hybrid

gahow/agentic-pd-hybrid

Fork 0

Commit Graph

Author	SHA1	Message	Date
tim	905d671135	feat(env): MC_TRANSFER_TIMEOUT=1800s default in setup_env + stack Mooncake C++ batch_transfer_sync defaults to 30s timeout; on saturated D scheduler threads doing LRU eviction, that fires as a false positive and the SGLang hair-trigger in conn.py:1270 permanently blacklists the D's mooncake_session_id (E2 forensic in docs/E1_E2_RESULTS_ZH.md §5c). Bump to 1800s in setup_env.sh and mirror to subprocess env in stack.py so SGLang workers get it too. 30-min envelope still detects genuinely broken peers eventually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:45:09 +08:00
tim	d11a66d11b	feat(scripts): cu12.8 env wrapper + Inferact trace converter setup_env.sh: source-able shell snippet that points tvm_ffi (vendor sglang JIT compiler) at \$HOME/cuda-12.8/bin/nvcc and exposes both libcudart.so.12 (for mooncake.engine, a cu12 wheel) and cu12.8 lib64 (for tvm_ffi compile-time linker) on LD_LIBRARY_PATH. Without this, JIT-compiled kernels NEEDED libcudart.so.13 and driver 570 rejected them at every JIT call. convert_inferact_to_trace.py: turns Inferact codex_swebenchpro_traces (ShareGPT {"from","value"} pairs) into the chat_id/parent_chat_id/ turn/hash_ids JSONL schema replay.py expects. Tokenizes with the model's own tokenizer, builds prefix-sharing 24-token block hashes, synthesizes timestamps. Output cross-checks 20,230 LLM calls — exactly matches the Inferact README count for 610 successful trials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 00:10:06 +08:00

Author

SHA1

Message

Date

tim

905d671135

feat(env): MC_TRANSFER_TIMEOUT=1800s default in setup_env + stack

Mooncake C++ batch_transfer_sync defaults to 30s timeout; on
saturated D scheduler threads doing LRU eviction, that fires as a
false positive and the SGLang hair-trigger in conn.py:1270
permanently blacklists the D's mooncake_session_id (E2 forensic in
docs/E1_E2_RESULTS_ZH.md §5c). Bump to 1800s in setup_env.sh and
mirror to subprocess env in stack.py so SGLang workers get it too.
30-min envelope still detects genuinely broken peers eventually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 11:45:09 +08:00

tim

d11a66d11b

feat(scripts): cu12.8 env wrapper + Inferact trace converter

setup_env.sh: source-able shell snippet that points tvm_ffi (vendor
sglang JIT compiler) at \$HOME/cuda-12.8/bin/nvcc and exposes both
libcudart.so.12 (for mooncake.engine, a cu12 wheel) and cu12.8 lib64
(for tvm_ffi compile-time linker) on LD_LIBRARY_PATH. Without this,
JIT-compiled kernels NEEDED libcudart.so.13 and driver 570 rejected
them at every JIT call.

convert_inferact_to_trace.py: turns Inferact codex_swebenchpro_traces
(ShareGPT {"from","value"} pairs) into the chat_id/parent_chat_id/
turn/hash_ids JSONL schema replay.py expects. Tokenizes with the
model's own tokenizer, builds prefix-sharing 24-token block hashes,
synthesizes timestamps. Output cross-checks 20,230 LLM calls — exactly
matches the Inferact README count for 610 successful trials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 00:10:06 +08:00

2 Commits