feat(stack): pin PD workers to --disable-overlap-schedule

On a node with driver 570.86.15 (cu12.8 driver API ceiling), SGLang's
overlap event loop hits cudaErrorInsufficientDriver inside
event_loop_overlap_disagg_prefill → resolve_future_token_ids JIT
kernel. Switching to the normal event loop sidesteps this specific
codepath. The flag is harmless on newer drivers and remains a useful
default until overlap is independently re-validated on this hardware.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
tim
2026-05-12 00:09:56 +08:00
parent e874b1f055
commit a418aafeed

View File

@@ -848,6 +848,8 @@ def _topology_from_args(args: argparse.Namespace):
force_rdma=args.force_rdma,
trust_remote_code=not args.no_trust_remote_code,
ib_device=args.ib_device,
prefill_extra_server_args=("--disable-overlap-schedule",),
decode_extra_server_args=("--disable-overlap-schedule",),
direct_extra_server_args=("--enable-streaming-session",),
)