Files
agentic-pd-hybrid/docs/archive/REFACTOR_PLAN_ZH.md
kzlin 7590e55189 docs: archive deprecated docs to docs/archive/, drop E1 from onboarding
Two cleanups:

1. Drop "E1: naive 1P3D default" experiment from the onboarding manual.
   GPU hours are precious; naive 1P3D + policy=default has near-certain
   loss on multi-turn cache hit (it's round-robin without prefix awareness),
   so the comparison doesn't add information vs E1=naive 1P3D kv-aware.
   The new manifest has only 2 runs: E1 (naive 1P3D kv-aware) + E2 (KVC
   v2 + RDMA). Run-time budget drops from 16.5h serial to 11h serial /
   5.5h parallel. Updated:
   - §0 TL;DR ("3 组" -> "2 组")
   - §2 H1 hypothesis (drop "default and kv-aware each one" -> just kv-aware)
   - §3.1 experiment matrix (3 rows -> 2 rows + rationale for the drop)
   - §3.2 startup config (drop E1 default section, renumber E2/E3 -> E1/E2)
   - §6 decision table + expected-range table
   - §7 FAQ ("3 个 E1-E3" -> "2 个 E1-E2")
   - §9 deliverables

2. Move 8 deprecated docs to docs/archive/:
     AGENTIC_FIT_ANALYSIS_ZH.md         (ts=10 era analysis; superseded)
     STRUCTURAL_VALIDATION_REPORT_ZH.md (ts=10 era validation; superseded)
     KVC_DEBUG_JOURNEY_V1_TO_V5.md      (v1-v5 sweep process notes)
     V5_PROFILE_INVESTIGATION_ZH.md     (v5 1Hz polling investigation)
     REFACTOR_PLAN_ZH.md                (v0 plan; superseded by V1)
     KVCACHE_CENTRIC_PROGRESS_ZH.md     (earliest 2026-04-27 progress)
     SWEBENCH_EXPERIMENT_PROGRESS.md    (early SWE trace setup)
     SWEBENCH_EXPERIMENT_RESULTS.md     (early SWE result snapshot)

   All cross-references in active docs (V2_DEEP_ANALYSIS / V2_RESULTS /
   REFACTOR_PLAN_V1 / TEAM_REPORT / ONBOARDING) rewritten from
   `docs/FOO.md` to `docs/archive/FOO.md` via sed pass.

   Added `docs/archive/README.md` explaining what each archived doc is
   and when (if ever) to reopen it. Designed so a new reader hitting
   the archive dir immediately knows it's not required reading.

After this commit the active docs in docs/ are 9 files (down from 17),
which should make the onboarding doc's "Level 1 / Level 2 / Level 3"
classification self-evident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 22:40:35 +08:00

124 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Refactor Plan v0极简版
**日期**2026-05-06
**目标**:用最小改动 + 轻量实验,验证 `docs/AGENTIC_FIT_ANALYSIS_ZH.md` 提出的结构性缺陷是否真实存在、影响多大。
**预算**8h GPU 时间(约 4-6 次 ~30-60 min smoke run
**KISS 边界**:不动 SGLang `scheduler.py` 主循环结构;不引入新 mooncake 协议;不实现 cross-D session migration不做 admission probe/commit 拆分;不动 LRU eviction 策略。
## 计划结论(与用户已确认的)
回审 plan-v0 时发现两个原 Phase 1 改动**都不是 bug**
- `_estimate_session_resident_tokens` 返回 full prompt 是设计如此——所有需要"增量"的 call site 都已经做 `target - current` 减法(`replay.py:1247-1254``:1393-1394``:1490-1491`)。
- `decode_resident_blocks` 不缩减只是浪费几 MB 内存,**不影响 routing 决策**SWE trace 的 hash_ids 是 session-uniquepolicy 仍能正确选 D
最终极简版只做一件代码改动(**加 backpressure**+ 大量 instrumentation。
## 唯一代码改动Backpressure 信号
### 改动点 1SGLang `admit_direct_append` 响应增加两个字段
文件:`third_party/sglang/python/sglang/srt/managers/io_struct.py``scheduler.py`
```python
@dataclass
class DirectAppendAdmissionReqOutput:
... # 已有字段保留
recommended_pause_ms: int = 0 # 新增
queue_depth: int = 0 # 新增
```
`scheduler.py:admit_direct_append` 末尾计算 hint
```python
def _compute_backpressure_pause_hint(self) -> float:
depth = len(self.disagg_decode_transfer_queue.queue)
if depth < 8:
return 0.0
return min(2000.0, depth * 100.0) # 简单线性
```
### 改动点 2replay 端按 hint 退避
文件:`src/agentic_pd_hybrid/replay.py`
- `DecodeResidencyState` 新增 `pause_until_s: dict[str, float]`
- `_query_decode_direct_admission` 解析响应里的 `recommended_pause_ms`,更新 `pause_until_s[server_url] = now + pause_ms / 1000`
- 在调 `_invoke_router` / `_invoke_decode_session_direct` 前检查 `pause_until_s[decode_url]`,若 `now < pause_until` 则 sleep 到该时刻
### 改动点 3新 CLI flag
`src/agentic_pd_hybrid/cli.py``benchmark.py`
```
--enable-backpressure # 默认 false保留 baseline 行为
```
### 改动点 4观测日志
每个 run dir 新增三个 jsonl
- `admission-events.jsonl`:每次 admission RPCtimestamp, session, D, can_admit, queue_depth, pause_ms, latency_s, available_tokens, evicted_session_count
- `backpressure-events.jsonl`:每次实际 sleeptimestamp, D, sleep_ms, queue_depth_at_signal
- `session-d-binding.jsonl`:每个 session 第一次 open 在某 D 时记录timestamp, session, D, turn_id
## 实验矩阵8h 预算内)
按"先做 anchor再做单变量对照"排序。每行右侧是预估机时。
| ID | 配置 | 目的 | 机时 |
|---|---|---|---|
| **E0 (existing)** | v5 baselinetime-scale=10无 backpressure | Anchor已存在 `outputs/qwen3-30b-tp1-v5-optD-baseline-rerun/run1` | 0 |
| **E1** | v5 + backpressure ONtime-scale=10全 trace | 验证 Claim §3backpressure 是否能消除 KVTransferError 雪崩) | ~50 min |
| **E2** | v5 baselinetime-scale=1**短 trace**(前 12 sessions ≈ 1000 reqs | 验证 Claim §7time-scale=10 失真);不开 backpressure | ~60 min |
| **E3** | 8DP CAtime-scale=1同 E2 trace | E2 的对照——真实时序下 KVC 是否仍输 DP | ~60 min |
| **E4** | v5 + backpressuretime-scale=1同 E2 trace | backpressure 在真实时序下还有用吗? | ~60 min |
| **E5**(备选) | v5 baselinetime-scale=10**concurrency=4**,全 trace | 验证 Claim §1高并发是不是必要条件 | ~50 min |
4-5 个 run~3-5h。剩余预算给失败重跑/分析。
## 实验目标——回到 §1-§7 一一对照
| 文档 § | Claim | 由哪个 exp 证伪/支持 | 需要的指标 |
|---|---|---|---|
| §1 | Session 永久 pin + 容量盲选造成双峰 | 已有 E0 数据足够 | direct-to-D rate per session distribution |
| §2 | LRU 跟不上压力 | 已有 E0 logs 足够 + E1 看 backpressure 之后 trim/error 比例变化 | trim 事件数 vs OOM 数 |
| §3 | 没 backpressure 是雪崩源 | E0 vs E1 | KVTransferError 数、P99 latency |
| §4 | admission RPC 干扰 scheduler | 不在本轮实验范围(需要 admission probe 拆分才能验,不做) | |
| §5 | P-side 不感知 D 健康 | 已有 E0 logs 足够prefill-0 vs prefill-1 错误数) | per-P KVTransferError |
| §6 | (已撤回) | | |
| §7 | time-scale=10 失真 | E0 vs E2同 KVC不同 time-scaleE2 vs E3同 time-scaleKVC vs DP | latency 分布、direct-to-D rate |
## Final 实验报告交付
跑完后输出 `docs/STRUCTURAL_VALIDATION_REPORT_ZH.md`,按 §1-§7 每条给出:
- **Claim 字面**
- **数据证据**(哪个 exp、哪个 metric
- **结论**:成立 / 部分成立 / 推翻
- **影响量化**:数字差异
- **不确定性**N=1 风险、其他 confounder
## 不做的事KISS 边界)
| 想做但不做 | 理由 |
|---|---|
| 跑 N=3 重复 | 8h 装不下single-run 可看大方向 |
| 全 sweep 参数 | 只调 time-scale 和 backpressure 一个 boolean |
| 改 LRU eviction | 不在本轮范围 |
| Cross-D migration | 不在本轮范围 |
| Admission probe/commit 拆分 | 不在本轮范围 |
| P-side D-health routing | 不在本轮范围 |
| 修两个"非 bug"estimate / aging | 验证后非真实 bug |
## 预期失败路径
- **GPU 资源紧张**smoke trace 进一步压缩(前 8 sessions / 600 reqs
- **time-scale=1 跑超 1.5h**:截断到 600s 内能完成的部分
- **backpressure 配错**:先用 sleep_ms = depth * 100 简单线性;调不通就回滚到 0无 backpressure
- **SGLang patch 编译错**:所有 patch 在 io_struct.py 和 scheduler.py 的少量行内,可单独 git restore
---
接下来:实现 → 跑 smoke → 写报告。