feat(stack): pin PD workers to --disable-overlap-schedule
On a node with driver 570.86.15 (cu12.8 driver API ceiling), SGLang's overlap event loop hits cudaErrorInsufficientDriver inside event_loop_overlap_disagg_prefill → resolve_future_token_ids JIT kernel. Switching to the normal event loop sidesteps this specific codepath. The flag is harmless on newer drivers and remains a useful default until overlap is independently re-validated on this hardware. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -848,6 +848,8 @@ def _topology_from_args(args: argparse.Namespace):
|
||||
force_rdma=args.force_rdma,
|
||||
trust_remote_code=not args.no_trust_remote_code,
|
||||
ib_device=args.ib_device,
|
||||
prefill_extra_server_args=("--disable-overlap-schedule",),
|
||||
decode_extra_server_args=("--disable-overlap-schedule",),
|
||||
direct_extra_server_args=("--enable-streaming-session",),
|
||||
)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user