docs(sglang): patch surface inventory + retire-after-refactor list
Resolves AUDIT_AND_ROADMAP §S6: the 785 lines of vendored
SGLang patch are a known reviewer trust risk because the
prototype touches scheduler.py / schedule_batch.py /
session_aware_cache.py / disaggregation hot paths. Without
classification readers cannot tell core mechanism from
temporary scaffold.
Classifies each of the 10 patched files into:
MUST-HAVE — Algorithm 1/2/3, streaming session
lifecycle, admit RPC. ~450 lines.
Long-term retention.
WORKAROUND — release_session token-free,
maybe_trim_decode_session_cache,
streaming-session extend_input_len
correction (incl. the E3 landmine
hotfix from commit 986f351),
DecodePreallocQueue trim trigger.
~150 lines. To DELETE entirely
after block-level eviction refactor
(BLOCK_LEVEL_EVICTION_DESIGN §3.7).
EXPERIMENTAL — backpressure pause hint
(_compute_backpressure_pause_hint).
~60 lines. Signal not closed-loop
per REAL_ALI §4.3; retain as hook
or retire in 1 month.
INSTRUMENTATION — _compute_pool_breakdown_for_diagnostics.
~50 lines. Keep behind a flag.
MINOR — ~3 lines. Ignore.
The §2 summary gives reviewers a one-glance picture of
what's core vs. scaffold. Maintenance convention in §3
mandates classifying every new (sglang) patch at commit
time.
§4 wires the classification into the roadmap: clearing
the WORKAROUND bucket is the objective completion marker
for block-level eviction refactor.
No code change.
This commit is contained in:
165
docs/SGLANG_PATCH_INVENTORY_ZH.md
Normal file
165
docs/SGLANG_PATCH_INVENTORY_ZH.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Vendored SGLang Patch — 归类清单
|
||||
|
||||
**日期**:2026-05-13
|
||||
**基线**:clean SGLang v0.5.10 snapshot @ `bded083`
|
||||
**当前 HEAD**:`origin/h200-cu130` + 本分支 (785 行新增 / 17 行删除 / 10 文件)
|
||||
**目的**:让 reviewer 与下一个合作者一眼看清"哪些 patch 是核心机制、哪些是 workaround、哪些可以在 refactor 后下线"。对应 [AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) §3.2 / §S6 的工程债项。
|
||||
|
||||
---
|
||||
|
||||
## 0. TL;DR
|
||||
|
||||
| 分类 | 文件数 | 行数(估) | 命运 |
|
||||
|---|---:|---:|---|
|
||||
| MUST-HAVE — 核心机制(Algorithm 1/2/3、streaming session lifecycle、admit RPC) | 6 | ~450 | 长期保留,是 paper claim 的核心 |
|
||||
| WORKAROUND — 已识别的 latent 问题修补,应在 refactor 后下线 | 2 | ~150 | block-level eviction refactor 完成后大量删除 |
|
||||
| EXPERIMENTAL — 未闭环的特性,论文不依赖 | 1 | ~60 | 可下线或保留为 future-work hook |
|
||||
| INSTRUMENTATION — 诊断 / 日志 | 1 | ~50 | 保留但应隔离到 debug build |
|
||||
| MINOR — 杂项 | 1 | ~3 | 不影响决策 |
|
||||
|
||||
**关键指引**:当 block-level eviction refactor([BLOCK_LEVEL_EVICTION_DESIGN_ZH.md](BLOCK_LEVEL_EVICTION_DESIGN_ZH.md))完成时,WORKAROUND 类的 ~150 行应同步删除。E3 触发的 `schedule_batch.py` invariant landmine 是这条路径上的产物,不修引擎而是修 evict 粒度才是正解。
|
||||
|
||||
---
|
||||
|
||||
## 1. 文件粒度清单
|
||||
|
||||
### 1.1 `mem_cache/session_aware_cache.py` — MUST-HAVE *(待 refactor)*
|
||||
|
||||
| 项目 | 内容 | 引入 | 分类 |
|
||||
|---|---|---|---|
|
||||
| `SessionSlot` dataclass | streaming session 跨 turn 复用 KV 的 metadata | b8e6f13 | MUST-HAVE |
|
||||
| `last_access_time` 字段 | LRU 决策需要 | 6e5ed8d | MUST-HAVE |
|
||||
| `match_prefix` / `cache_finished_req` / `cache_unfinished_req` 的 streaming 分支 | session 复用快路径 | b8e6f13 | **MUST-HAVE → 待 refactor**(block-level evict 后语义大改) |
|
||||
| `release_session` 直接 `free(kv_indices)` | session 退出时一次性归还 KV | b8e6f13 | **WORKAROUND → 替换**(refactor 后改为只 `dec_lock_ref`) |
|
||||
| `slot_held_tokens` / `get_session_status` / `list_session_statuses` | 状态查询 | 6e5ed8d | MUST-HAVE |
|
||||
|
||||
**说明**:本文件是 KVC 设计的中枢。block-level eviction refactor([BLOCK_LEVEL_EVICTION_DESIGN_ZH.md](BLOCK_LEVEL_EVICTION_DESIGN_ZH.md) §3.1–§3.6)改造的就是这里。`SessionSlot` 的 5 个 KV-ownership 字段(`req_pool_idx` / `kv_committed_len` / `kv_allocated_len` / `cache_protected_len` / `swa_evicted_seqlen`)应在 refactor 后删除;这部分**将由 commit message 单独标记**,方便回滚。
|
||||
|
||||
### 1.2 `managers/scheduler.py` — 混合类别
|
||||
|
||||
D worker 端的 Algorithm 2 实现,含多个独立 patch。按行级归类:
|
||||
|
||||
| 函数 / 行段 | 内容 | 分类 | 何时可下线 |
|
||||
|---|---|---|---|
|
||||
| `admit_direct_append(...)` | Algorithm 2 的 D 端 admission RPC handler | **MUST-HAVE** | 不下线(论文核心) |
|
||||
| `_should_allow_local_prefill_on_decode(req)` | 决定 decode worker 是否接受无 bootstrap 的本地 append-prefill | **MUST-HAVE** | 不下线 |
|
||||
| `_decode_session_cache_low_watermark_tokens()` | 水位线参数读取 | **WORKAROUND** | block-level evict 后由 radix LRU 取代 |
|
||||
| `_decode_session_cache_target_available_tokens()` | 目标可用 token 数计算 | **WORKAROUND** | 同上 |
|
||||
| `maybe_trim_decode_session_cache(...)` | 主动 trim session(触发 `release_session`) | **WORKAROUND** | 同上:refactor 后 radix LRU 自然蚕食,trim 不再必要 |
|
||||
| `_compute_backpressure_pause_hint(...)` | 给 router 的 pause 提示 | **EXPERIMENTAL** | 信号未闭环([REAL_ALI_KVC_EXPERIMENT_LOG_ZH.md](../docs/archive/) §4.3),路线图 §S10;可保留为 future work hook |
|
||||
| `_compute_pool_breakdown_for_diagnostics()` | 池状态快照供 `/server_info` | **INSTRUMENTATION** | 长期保留但建议门 flag 化 |
|
||||
|
||||
### 1.3 `managers/schedule_batch.py` — WORKAROUND(待删除)
|
||||
|
||||
| 项目 | 内容 | 引入 | 分类 |
|
||||
|---|---|---|---|
|
||||
| streaming-session `extend_input_len` correction (lines ~1572–1585) | 在 fill_ids < prefix_indices 时把 extend_input_len 改为 0 | b8e6f13 | **WORKAROUND** |
|
||||
| pre-filter pass dropping `fill_ids < prefix_indices` reqs | E3 触发 assertion 后的 hotfix(commit 986f351) | 986f351 | **WORKAROUND** |
|
||||
| invariant assert `seq_len - pre_len == req.extend_input_len` 的容忍逻辑 | 与 correction 配套 | b8e6f13 | **WORKAROUND** |
|
||||
|
||||
**全部** ~85 行在 block-level eviction refactor 完成后**应整体删除**——`BLOCK_LEVEL_EVICTION_DESIGN_ZH §3.7` 已说明 refactor 后该不变量结构上必然成立,correction 路径无需存在。E3 的 landmine ([E3_FINDINGS_ZH.md](E3_FINDINGS_ZH.md) §2) 是该 workaround 的产物。
|
||||
|
||||
### 1.4 `managers/session_controller.py` — MUST-HAVE
|
||||
|
||||
| 项目 | 内容 | 分类 |
|
||||
|---|---|---|
|
||||
| streaming session lifecycle hooks(open / close / admit signal) | 让 P/D worker 知道何时开始 / 结束一个 streaming session | MUST-HAVE |
|
||||
| session ID 路由 | 让 admission RPC 找到正确的 SessionSlot | MUST-HAVE |
|
||||
|
||||
不下线。
|
||||
|
||||
### 1.5 `managers/io_struct.py` — MUST-HAVE
|
||||
|
||||
| 项目 | 内容 | 分类 |
|
||||
|---|---|---|
|
||||
| `AdmitDirectAppendReqInput` / `AdmitDirectAppendReqOutput` | admit RPC 的请求 / 响应消息类型 | MUST-HAVE |
|
||||
| backpressure pause hint 字段 | 同上消息的 optional 字段 | EXPERIMENTAL |
|
||||
|
||||
可以把 EXPERIMENTAL 字段折叠到 MUST-HAVE 消息里保持兼容;本身不构成下线压力。
|
||||
|
||||
### 1.6 `managers/tokenizer_communicator_mixin.py` — MUST-HAVE
|
||||
|
||||
admit RPC 的 communicator-side glue。19 行,不下线。
|
||||
|
||||
### 1.7 `entrypoints/http_server.py` — MUST-HAVE
|
||||
|
||||
`/admit_direct_append` HTTP endpoint 注册。6 行。
|
||||
|
||||
### 1.8 `disaggregation/decode.py` — 混合类别
|
||||
|
||||
| 项目 | 内容 | 分类 |
|
||||
|---|---|---|
|
||||
| `DecodeReqToTokenPool`: `assert len(reusing) <= 1` 放宽 | 让 local append-prefill 在一个 batch 里复用多个 req_pool_idx | **MUST-HAVE** |
|
||||
| `DecodePreallocQueue` 引入 `refresh_allocatable_tokens` + `maybe_trim_decode_session_cache` 触发 | pool 满时主动 trim session | **WORKAROUND**(refactor 后改由 radix LRU 自然 shed) |
|
||||
| `--disaggregation-decode-allow-local-prefill` flag | 服务端 opt-in 本地 append-prefill | **MUST-HAVE** |
|
||||
|
||||
trim 触发逻辑 ~30 行在 refactor 后应删除。
|
||||
|
||||
### 1.9 `server_args.py` — MUST-HAVE
|
||||
|
||||
| 项目 | 内容 | 分类 |
|
||||
|---|---|---|
|
||||
| `--radix-eviction-policy priority` 选项 | E1/E2 实验需要 | MUST-HAVE |
|
||||
| `--disaggregation-decode-allow-local-prefill` flag | 见 §1.8 | MUST-HAVE |
|
||||
|
||||
13 行,全部是 CLI 接口扩展,不下线。
|
||||
|
||||
### 1.10 `disaggregation/mooncake_transfer_engine.py` — MINOR
|
||||
|
||||
3 行小调整。不构成决策点。
|
||||
|
||||
---
|
||||
|
||||
## 2. 按分类汇总
|
||||
|
||||
### 2.1 MUST-HAVE(保留)
|
||||
|
||||
约 6 个文件、450 行:
|
||||
- `admit_direct_append` 主链路(Algorithm 2):scheduler + io_struct + tokenizer_communicator_mixin + http_server + session_controller
|
||||
- `SessionSlot` 主链路(streaming session lifecycle):session_aware_cache 多数字段、session_controller
|
||||
- CLI / server interface:server_args、decode.py 的 `allow_local_prefill`
|
||||
|
||||
### 2.2 WORKAROUND(block-level evict refactor 后删除)
|
||||
|
||||
约 2.5 个文件、150 行:
|
||||
- `session_aware_cache.release_session` 的 token-free 路径
|
||||
- `scheduler.py` 的 `_decode_session_cache_*_watermark_tokens` + `maybe_trim_decode_session_cache`
|
||||
- `schedule_batch.py` streaming-session correction + drop-pre-filter(含 E3 landmine 的 hotfix)
|
||||
- `decode.py` `DecodePreallocQueue` 中的 trim 触发
|
||||
|
||||
→ 这些 patch 的存在是当前架构的产物;refactor 后应整段删除而不是修小 bug。
|
||||
|
||||
### 2.3 EXPERIMENTAL(未闭环)
|
||||
|
||||
约 60 行:
|
||||
- backpressure pause hint(`_compute_backpressure_pause_hint` + io_struct 字段):可作为未来 control-plane 反馈机制的 hook 保留;若 1 个月后仍未接通,下线
|
||||
|
||||
### 2.4 INSTRUMENTATION(长期保留但门 flag 化)
|
||||
|
||||
约 50 行:
|
||||
- `_compute_pool_breakdown_for_diagnostics` + 相关 `/server_info` 字段:建议加 `--enable-diagnostic-pool-snapshot` flag,避免 prod 路径背诊断开销
|
||||
|
||||
### 2.5 MINOR
|
||||
|
||||
约 3 行:忽略。
|
||||
|
||||
---
|
||||
|
||||
## 3. 维护约定
|
||||
|
||||
1. **新加 SGLang 改动必须落到本表**:在 commit message 用 `feat(sglang): ...` / `fix(sglang): ...` 前缀,并在 PR 描述声明落到 §2 哪一类。
|
||||
2. **不直接覆盖 upstream 文件**:所有 patch 必须可在 v0.5.10 上 git apply(保留 hunk header 整洁)。
|
||||
3. **删除 WORKAROUND 时同步删 doc**:refactor 完成的同一个 PR 应把本文表中对应行划掉。
|
||||
4. **不下放 EXPERIMENTAL 到主路径**:未闭环的 patch 必须默认 disabled。
|
||||
|
||||
---
|
||||
|
||||
## 4. 与路线图的衔接
|
||||
|
||||
- Milestone 1([AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) §4)执行 block-level eviction refactor 时,**整段 §2.2 应该消失**——这是衡量 refactor 完成度的客观指标。
|
||||
- Milestone 2 把 control plane 拆层(§4.8)时,§2.3 backpressure pause hint 应或被启用、或被下线,不允许悬挂。
|
||||
- Milestone 3 引入 learning-based admission(§4.15)时,§2.1 的 `admit_direct_append` 接口应保持稳定,policy 替换在 router 侧而非 D 侧。
|
||||
|
||||
---
|
||||
|
||||
**核心句**:vendored SGLang 的 785 行不是 monolithic 黑箱——三分之二是核心机制(论文必备),三分之一是当前架构的 workaround(refactor 后可整段删)。reviewer 看到本表能立刻判断"哪些是 paper 的真贡献、哪些是 prototype 当前的临时支撑"。
|
||||
Reference in New Issue
Block a user