docs(sglang): patch surface inventory + retire-after-refactor list

Resolves AUDIT_AND_ROADMAP §S6: the 785 lines of vendored
SGLang patch are a known reviewer trust risk because the
prototype touches scheduler.py / schedule_batch.py /
session_aware_cache.py / disaggregation hot paths. Without
classification readers cannot tell core mechanism from
temporary scaffold.

Classifies each of the 10 patched files into:
  MUST-HAVE         — Algorithm 1/2/3, streaming session
                       lifecycle, admit RPC. ~450 lines.
                       Long-term retention.
  WORKAROUND        — release_session token-free,
                       maybe_trim_decode_session_cache,
                       streaming-session extend_input_len
                       correction (incl. the E3 landmine
                       hotfix from commit 986f351),
                       DecodePreallocQueue trim trigger.
                       ~150 lines. To DELETE entirely
                       after block-level eviction refactor
                       (BLOCK_LEVEL_EVICTION_DESIGN §3.7).
  EXPERIMENTAL      — backpressure pause hint
                       (_compute_backpressure_pause_hint).
                       ~60 lines. Signal not closed-loop
                       per REAL_ALI §4.3; retain as hook
                       or retire in 1 month.
  INSTRUMENTATION   — _compute_pool_breakdown_for_diagnostics.
                       ~50 lines. Keep behind a flag.
  MINOR             — ~3 lines. Ignore.

The §2 summary gives reviewers a one-glance picture of
what's core vs. scaffold. Maintenance convention in §3
mandates classifying every new (sglang) patch at commit
time.

§4 wires the classification into the roadmap: clearing
the WORKAROUND bucket is the objective completion marker
for block-level eviction refactor.

No code change.
This commit is contained in:
2026-05-13 00:42:22 +08:00
parent 9a81c993ab
commit d93228e156

View File

@@ -0,0 +1,165 @@
# Vendored SGLang Patch — 归类清单
**日期**2026-05-13
**基线**clean SGLang v0.5.10 snapshot @ `bded083`
**当前 HEAD**`origin/h200-cu130` + 本分支 (785 行新增 / 17 行删除 / 10 文件)
**目的**:让 reviewer 与下一个合作者一眼看清"哪些 patch 是核心机制、哪些是 workaround、哪些可以在 refactor 后下线"。对应 [AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) §3.2 / §S6 的工程债项。
---
## 0. TL;DR
| 分类 | 文件数 | 行数(估) | 命运 |
|---|---:|---:|---|
| MUST-HAVE — 核心机制Algorithm 1/2/3、streaming session lifecycle、admit RPC | 6 | ~450 | 长期保留,是 paper claim 的核心 |
| WORKAROUND — 已识别的 latent 问题修补,应在 refactor 后下线 | 2 | ~150 | block-level eviction refactor 完成后大量删除 |
| EXPERIMENTAL — 未闭环的特性,论文不依赖 | 1 | ~60 | 可下线或保留为 future-work hook |
| INSTRUMENTATION — 诊断 / 日志 | 1 | ~50 | 保留但应隔离到 debug build |
| MINOR — 杂项 | 1 | ~3 | 不影响决策 |
**关键指引**:当 block-level eviction refactor[BLOCK_LEVEL_EVICTION_DESIGN_ZH.md](BLOCK_LEVEL_EVICTION_DESIGN_ZH.md)完成时WORKAROUND 类的 ~150 行应同步删除。E3 触发的 `schedule_batch.py` invariant landmine 是这条路径上的产物,不修引擎而是修 evict 粒度才是正解。
---
## 1. 文件粒度清单
### 1.1 `mem_cache/session_aware_cache.py` — MUST-HAVE *(待 refactor*
| 项目 | 内容 | 引入 | 分类 |
|---|---|---|---|
| `SessionSlot` dataclass | streaming session 跨 turn 复用 KV 的 metadata | b8e6f13 | MUST-HAVE |
| `last_access_time` 字段 | LRU 决策需要 | 6e5ed8d | MUST-HAVE |
| `match_prefix` / `cache_finished_req` / `cache_unfinished_req` 的 streaming 分支 | session 复用快路径 | b8e6f13 | **MUST-HAVE → 待 refactor**block-level evict 后语义大改) |
| `release_session` 直接 `free(kv_indices)` | session 退出时一次性归还 KV | b8e6f13 | **WORKAROUND → 替换**refactor 后改为只 `dec_lock_ref` |
| `slot_held_tokens` / `get_session_status` / `list_session_statuses` | 状态查询 | 6e5ed8d | MUST-HAVE |
**说明**:本文件是 KVC 设计的中枢。block-level eviction refactor[BLOCK_LEVEL_EVICTION_DESIGN_ZH.md](BLOCK_LEVEL_EVICTION_DESIGN_ZH.md) §3.1§3.6)改造的就是这里。`SessionSlot` 的 5 个 KV-ownership 字段(`req_pool_idx` / `kv_committed_len` / `kv_allocated_len` / `cache_protected_len` / `swa_evicted_seqlen`)应在 refactor 后删除;这部分**将由 commit message 单独标记**,方便回滚。
### 1.2 `managers/scheduler.py` — 混合类别
D worker 端的 Algorithm 2 实现,含多个独立 patch。按行级归类
| 函数 / 行段 | 内容 | 分类 | 何时可下线 |
|---|---|---|---|
| `admit_direct_append(...)` | Algorithm 2 的 D 端 admission RPC handler | **MUST-HAVE** | 不下线(论文核心) |
| `_should_allow_local_prefill_on_decode(req)` | 决定 decode worker 是否接受无 bootstrap 的本地 append-prefill | **MUST-HAVE** | 不下线 |
| `_decode_session_cache_low_watermark_tokens()` | 水位线参数读取 | **WORKAROUND** | block-level evict 后由 radix LRU 取代 |
| `_decode_session_cache_target_available_tokens()` | 目标可用 token 数计算 | **WORKAROUND** | 同上 |
| `maybe_trim_decode_session_cache(...)` | 主动 trim session触发 `release_session` | **WORKAROUND** | 同上refactor 后 radix LRU 自然蚕食trim 不再必要 |
| `_compute_backpressure_pause_hint(...)` | 给 router 的 pause 提示 | **EXPERIMENTAL** | 信号未闭环([REAL_ALI_KVC_EXPERIMENT_LOG_ZH.md](../docs/archive/) §4.3),路线图 §S10可保留为 future work hook |
| `_compute_pool_breakdown_for_diagnostics()` | 池状态快照供 `/server_info` | **INSTRUMENTATION** | 长期保留但建议门 flag 化 |
### 1.3 `managers/schedule_batch.py` — WORKAROUND待删除
| 项目 | 内容 | 引入 | 分类 |
|---|---|---|---|
| streaming-session `extend_input_len` correction (lines ~15721585) | 在 fill_ids < prefix_indices 时把 extend_input_len 改为 0 | b8e6f13 | **WORKAROUND** |
| pre-filter pass dropping `fill_ids < prefix_indices` reqs | E3 触发 assertion 后的 hotfixcommit 986f351 | 986f351 | **WORKAROUND** |
| invariant assert `seq_len - pre_len == req.extend_input_len` 的容忍逻辑 | correction 配套 | b8e6f13 | **WORKAROUND** |
**全部** ~85 行在 block-level eviction refactor 完成后**应整体删除**——`BLOCK_LEVEL_EVICTION_DESIGN_ZH §3.7` 已说明 refactor 后该不变量结构上必然成立correction 路径无需存在E3 landmine ([E3_FINDINGS_ZH.md](E3_FINDINGS_ZH.md) §2) 是该 workaround 的产物
### 1.4 `managers/session_controller.py` — MUST-HAVE
| 项目 | 内容 | 分类 |
|---|---|---|
| streaming session lifecycle hooksopen / close / admit signal | P/D worker 知道何时开始 / 结束一个 streaming session | MUST-HAVE |
| session ID 路由 | admission RPC 找到正确的 SessionSlot | MUST-HAVE |
不下线
### 1.5 `managers/io_struct.py` — MUST-HAVE
| 项目 | 内容 | 分类 |
|---|---|---|
| `AdmitDirectAppendReqInput` / `AdmitDirectAppendReqOutput` | admit RPC 的请求 / 响应消息类型 | MUST-HAVE |
| backpressure pause hint 字段 | 同上消息的 optional 字段 | EXPERIMENTAL |
可以把 EXPERIMENTAL 字段折叠到 MUST-HAVE 消息里保持兼容本身不构成下线压力
### 1.6 `managers/tokenizer_communicator_mixin.py` — MUST-HAVE
admit RPC communicator-side glue19 不下线
### 1.7 `entrypoints/http_server.py` — MUST-HAVE
`/admit_direct_append` HTTP endpoint 注册6
### 1.8 `disaggregation/decode.py` — 混合类别
| 项目 | 内容 | 分类 |
|---|---|---|
| `DecodeReqToTokenPool`: `assert len(reusing) <= 1` 放宽 | local append-prefill 在一个 batch 里复用多个 req_pool_idx | **MUST-HAVE** |
| `DecodePreallocQueue` 引入 `refresh_allocatable_tokens` + `maybe_trim_decode_session_cache` 触发 | pool 满时主动 trim session | **WORKAROUND**refactor 后改由 radix LRU 自然 shed |
| `--disaggregation-decode-allow-local-prefill` flag | 服务端 opt-in 本地 append-prefill | **MUST-HAVE** |
trim 触发逻辑 ~30 行在 refactor 后应删除
### 1.9 `server_args.py` — MUST-HAVE
| 项目 | 内容 | 分类 |
|---|---|---|
| `--radix-eviction-policy priority` 选项 | E1/E2 实验需要 | MUST-HAVE |
| `--disaggregation-decode-allow-local-prefill` flag | §1.8 | MUST-HAVE |
13 全部是 CLI 接口扩展不下线
### 1.10 `disaggregation/mooncake_transfer_engine.py` — MINOR
3 行小调整不构成决策点
---
## 2. 按分类汇总
### 2.1 MUST-HAVE保留
6 个文件450
- `admit_direct_append` 主链路Algorithm 2scheduler + io_struct + tokenizer_communicator_mixin + http_server + session_controller
- `SessionSlot` 主链路streaming session lifecyclesession_aware_cache 多数字段session_controller
- CLI / server interfaceserver_argsdecode.py `allow_local_prefill`
### 2.2 WORKAROUNDblock-level evict refactor 后删除)
2.5 个文件150
- `session_aware_cache.release_session` token-free 路径
- `scheduler.py` `_decode_session_cache_*_watermark_tokens` + `maybe_trim_decode_session_cache`
- `schedule_batch.py` streaming-session correction + drop-pre-filter E3 landmine hotfix
- `decode.py` `DecodePreallocQueue` 中的 trim 触发
这些 patch 的存在是当前架构的产物refactor 后应整段删除而不是修小 bug
### 2.3 EXPERIMENTAL未闭环
60
- backpressure pause hint`_compute_backpressure_pause_hint` + io_struct 字段可作为未来 control-plane 反馈机制的 hook 保留 1 个月后仍未接通下线
### 2.4 INSTRUMENTATION长期保留但门 flag 化)
50
- `_compute_pool_breakdown_for_diagnostics` + 相关 `/server_info` 字段建议加 `--enable-diagnostic-pool-snapshot` flag避免 prod 路径背诊断开销
### 2.5 MINOR
3 忽略
---
## 3. 维护约定
1. **新加 SGLang 改动必须落到本表** commit message `feat(sglang): ...` / `fix(sglang): ...` 前缀并在 PR 描述声明落到 §2 哪一类
2. **不直接覆盖 upstream 文件**所有 patch 必须可在 v0.5.10 git apply保留 hunk header 整洁)。
3. **删除 WORKAROUND 时同步删 doc**refactor 完成的同一个 PR 应把本文表中对应行划掉
4. **不下放 EXPERIMENTAL 到主路径**未闭环的 patch 必须默认 disabled
---
## 4. 与路线图的衔接
- Milestone 1[AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) §4执行 block-level eviction refactor **整段 §2.2 应该消失**——这是衡量 refactor 完成度的客观指标
- Milestone 2 control plane 拆层(§4.8,§2.3 backpressure pause hint 应或被启用或被下线不允许悬挂
- Milestone 3 引入 learning-based admission(§4.15,§2.1 `admit_direct_append` 接口应保持稳定policy 替换在 router 侧而非 D
---
**核心句**vendored SGLang 785 行不是 monolithic 黑箱——三分之二是核心机制论文必备三分之一是当前架构的 workaroundrefactor 后可整段删)。reviewer 看到本表能立刻判断"哪些是 paper 的真贡献哪些是 prototype 当前的临时支撑"。