docs: document sglang maintenance workflow

2026-04-24 12:31:32 +00:00
parent b8e6f13c20
commit 5bdc0ed4f0
2 changed files with 30 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -35,8 +35,10 @@ Sync the environment:
 uv sync
 ```

-Local experiments can use a repo-local `third_party/sglang` checkout of SGLang
-`v0.5.10`, but that heavyweight checkout is intentionally not committed here.
+`third_party/sglang` vendors a clean SGLang `v0.5.10` snapshot plus our local
+PD/session-cache patches in later commits. Keep SGLang changes scoped under that
+directory and commit them with `feat(sglang): ...` or `fix(sglang): ...` so they
+stay easy to review against the vendor baseline.

 ## CLI

@@ -126,9 +128,9 @@ Notes:
 - Live benchmarking uses the repo-local `agentic_pd_hybrid.pd_router`, which
  preserves the real prefill/decode double-request path over loopback without
  depending on the upstream Rust router build.
- Managed live benchmarking prefers a local
-  `third_party/sglang/python/sglang` checkout when it exists, so local SGLang
-  source changes can apply immediately without packaging a wheel.
+- Managed live benchmarking prefers the vendored
+  `third_party/sglang/python/sglang` source tree, so local SGLang changes apply
+  immediately without packaging a wheel.
 - Live benchmarking currently targets the `mooncake` transfer backend, because
  `mooncake-transfer-engine` is installed and usable on this node.
 - `benchmark-live` and `replay` support streaming by default for TTFT/TPOT
--- a/docs/PROJECT_OVERVIEW.md
+++ b/docs/PROJECT_OVERVIEW.md
@@ -5,9 +5,11 @@ session-aware and KV-cache-aware prefill/decode routing can improve end-to-end
 latency for agentic coding workloads on top of SGLang xPyD.

 The current target environment is a single 8-GPU node running SGLang `v0.5.10`
-with Qwen3-Coder-30B-A3B-Instruct. The local setup keeps the P -> D transfer
-path through SGLang disaggregation and Mooncake loopback instead of replacing it
-with an in-process shortcut.
+with Qwen3-Coder-30B-A3B-Instruct. The repo vendors SGLang under
+`third_party/sglang` so our xPyD/session-cache changes are maintained together
+with the benchmark harness. The local setup keeps the P -> D transfer path
+through SGLang disaggregation and Mooncake loopback instead of replacing it with
+an in-process shortcut.

 ## Design

@@ -57,6 +59,24 @@ The prototype currently includes:
  disaggregation wait timeout to avoid treating transfer hangs as successful
  long-tail responses.

+## SGLang Maintenance
+
+SGLang is tracked directly in this repository:
+
+- `chore: vendor sglang v0.5.10 snapshot` records the clean upstream baseline.
+- Later `feat(sglang): ...` / `fix(sglang): ...` commits should contain only
+  local SGLang changes.
+- Generated files such as `__pycache__` and benchmark outputs stay ignored.
+
+The current SGLang patch adds the worker-side mechanisms needed by
+KV-cache-centric experiments:
+
+- decode workers can optionally accept local append-prefill requests in PD mode;
+- streaming session cache status is exposed for router/admission decisions;
+- idle streaming sessions can be evicted at session granularity;
+- direct append admission can check resident session state and D token pressure
+  before the replay path bypasses P.
+
 ## Current Findings

 The micro-benchmark can make KV-cache-centric routing look better than