docs: document sglang maintenance workflow
This commit is contained in:
@@ -5,9 +5,11 @@ session-aware and KV-cache-aware prefill/decode routing can improve end-to-end
|
||||
latency for agentic coding workloads on top of SGLang xPyD.
|
||||
|
||||
The current target environment is a single 8-GPU node running SGLang `v0.5.10`
|
||||
with Qwen3-Coder-30B-A3B-Instruct. The local setup keeps the P -> D transfer
|
||||
path through SGLang disaggregation and Mooncake loopback instead of replacing it
|
||||
with an in-process shortcut.
|
||||
with Qwen3-Coder-30B-A3B-Instruct. The repo vendors SGLang under
|
||||
`third_party/sglang` so our xPyD/session-cache changes are maintained together
|
||||
with the benchmark harness. The local setup keeps the P -> D transfer path
|
||||
through SGLang disaggregation and Mooncake loopback instead of replacing it with
|
||||
an in-process shortcut.
|
||||
|
||||
## Design
|
||||
|
||||
@@ -57,6 +59,24 @@ The prototype currently includes:
|
||||
disaggregation wait timeout to avoid treating transfer hangs as successful
|
||||
long-tail responses.
|
||||
|
||||
## SGLang Maintenance
|
||||
|
||||
SGLang is tracked directly in this repository:
|
||||
|
||||
- `chore: vendor sglang v0.5.10 snapshot` records the clean upstream baseline.
|
||||
- Later `feat(sglang): ...` / `fix(sglang): ...` commits should contain only
|
||||
local SGLang changes.
|
||||
- Generated files such as `__pycache__` and benchmark outputs stay ignored.
|
||||
|
||||
The current SGLang patch adds the worker-side mechanisms needed by
|
||||
KV-cache-centric experiments:
|
||||
|
||||
- decode workers can optionally accept local append-prefill requests in PD mode;
|
||||
- streaming session cache status is exposed for router/admission decisions;
|
||||
- idle streaming sessions can be evicted at session granularity;
|
||||
- direct append admission can check resident session state and D token pressure
|
||||
before the replay path bypasses P.
|
||||
|
||||
## Current Findings
|
||||
|
||||
The micro-benchmark can make KV-cache-centric routing look better than
|
||||
|
||||
Reference in New Issue
Block a user