docs(index): collaborator-facing doc index
Single navigation entry point. Existing docs were scattered across five branches with no clear reading order — this is the fix. Includes: - 3-doc fast path for anyone joining - topic-grouped table (algorithm / experiments / design discussions / evaluation / environment / archive) - role-based reading paths (new SWE, paper reviewer, reproducing student, control-plane reader) Index also references the four docs added later on this branch (AUDIT_AND_ROADMAP, BLOCK_LEVEL_EVICTION_DESIGN, D_TO_P_SYNC_CONTRACT, EVALUATION_PROTOCOL) so reviewers can see the planned layout up front.
This commit is contained in:
112
docs/INDEX_ZH.md
Normal file
112
docs/INDEX_ZH.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# 文档索引
|
||||
|
||||
**目的**:让任何合作者在 10 分钟内找到他需要的文档;让 Reviewer 知道哪些先看。
|
||||
|
||||
---
|
||||
|
||||
## 0. 时间紧的 3 篇
|
||||
|
||||
按这个顺序读完即可参与讨论:
|
||||
|
||||
1. [AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) — 项目当前进度、薄弱点、路线图。
|
||||
2. [KVC_ROUTER_ALGORITHM.md](KVC_ROUTER_ALGORITHM.md) — 算法形式化(Algorithm 1/2/3 + Theorem 1/2)。
|
||||
3. [V2_DEEP_ANALYSIS_ZH.md](V2_DEEP_ANALYSIS_ZH.md) §0 + §6 — v2 当前 win/lose snapshot。
|
||||
|
||||
---
|
||||
|
||||
## 1. 按主题分类
|
||||
|
||||
### 1.1 进度 / 现状
|
||||
|
||||
| 文档 | 内容 |
|
||||
|---|---|
|
||||
| [AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) | 跨分支整合 + 路线图(本分支的总入口) |
|
||||
| [PROJECT_OVERVIEW.md](PROJECT_OVERVIEW.md) | 项目目标 + 三种 mechanism(pd-disagg / pd-colo / kvcache-centric)的术语区分 |
|
||||
| [ONBOARDING_NEXT_AGENT_ZH.md](ONBOARDING_NEXT_AGENT_ZH.md) | 接班 agent 30 分钟上手手册(来自 `kvc-debug-journey-v1-to-v4`) |
|
||||
|
||||
### 1.2 算法 / 形式化
|
||||
|
||||
| 文档 | 内容 |
|
||||
|---|---|
|
||||
| [KVC_ROUTER_ALGORITHM.md](KVC_ROUTER_ALGORITHM.md) | Algorithm 1(Route)/ 2(Admit)/ 3(Dispatch)+ Theorem 1(无饿死)+ Theorem 2(fast-path 命中下限) |
|
||||
| [MIGRATION_V1_FINDINGS_ZH.md](MIGRATION_V1_FINDINGS_ZH.md) | v1 thrashing pathology 的实测 + 为什么 reset-on-success 是关键修复 |
|
||||
|
||||
### 1.3 实验结果
|
||||
|
||||
| 文档 | 内容 |
|
||||
|---|---|
|
||||
| [V2_DEEP_ANALYSIS_ZH.md](V2_DEEP_ANALYSIS_ZH.md) | SWE-Bench 50 sess ts=1:v2 vs 4DP CA 的 6/8 win + TTFT p99 落后原因 |
|
||||
| [V2_RESULTS_ZH.md](V2_RESULTS_ZH.md) | v2 原始战报(headline 数字略乐观,请同时看 deep analysis) |
|
||||
| [E1_E2_RESULTS_ZH.md](E1_E2_RESULTS_ZH.md) | H200 + RDMA 上 E1(naive 1P3D + kv-aware)vs E2(KVC v2);E2 80% failure 的 forensic |
|
||||
| [E3_FINDINGS_ZH.md](E3_FINDINGS_ZH.md) | E3(+load-floor bonus)16 min 触发 SGLang patch invariant crash |
|
||||
| [E1_E2_FIX_DESIGN_ZH.md](E1_E2_FIX_DESIGN_ZH.md) | Q1(mooncake death)+ Q2(cold-D2)的 fix 设计 |
|
||||
|
||||
### 1.4 当前关键 design discussion
|
||||
|
||||
| 文档 | 内容 |
|
||||
|---|---|
|
||||
| [KVC_EVICTION_GRANULARITY_DESIGN_ZH.md](KVC_EVICTION_GRANULARITY_DESIGN_ZH.md) | 架构层反思:session-level evict 与 KVC continuity 设计冲突 |
|
||||
| [BLOCK_LEVEL_EVICTION_DESIGN_ZH.md](BLOCK_LEVEL_EVICTION_DESIGN_ZH.md) | block-level evict refactor 的具体 API / 步骤 / 测试计划(本分支新增) |
|
||||
| [RESEED_SLOW_PATH_AND_D_TO_P_GAP_ZH.md](RESEED_SLOW_PATH_AND_D_TO_P_GAP_ZH.md) | reseed 慢路径时间线 + D→P 同步缺口的 forensic |
|
||||
| [D_TO_P_SYNC_CONTRACT_ZH.md](D_TO_P_SYNC_CONTRACT_ZH.md) | D→P sync 的接口契约、staleness budget、rollout 阶段(本分支新增) |
|
||||
|
||||
### 1.5 评测 / 方法论
|
||||
|
||||
| 文档 | 内容 |
|
||||
|---|---|
|
||||
| [EVALUATION_PROTOCOL_ZH.md](EVALUATION_PROTOCOL_ZH.md) | paper-quality 评测协议(N、CI、paired、stratify、baseline list、trace mix)—— 本分支新增 |
|
||||
| [REFACTOR_PLAN_V1_ZH.md](REFACTOR_PLAN_V1_ZH.md) | 为什么从 ts=10 切到 ts=1 |
|
||||
| [TEAM_REPORT_AGENTIC_PD_HYBRID_ZH.md](TEAM_REPORT_AGENTIC_PD_HYBRID_ZH.md) | ts=10 时代的结构性问题清单(多数已 supersede) |
|
||||
|
||||
### 1.6 环境
|
||||
|
||||
| 文档 | 内容 |
|
||||
|---|---|
|
||||
| [H200_DRIVER570_SETUP_ZH.md](H200_DRIVER570_SETUP_ZH.md) | H200 + driver 570 + cu12.8 环境搭建 + 11 条 lesson learned |
|
||||
|
||||
### 1.7 归档(仅历史参考)
|
||||
|
||||
`docs/archive/` 下的内容已被新文档 supersede,不必看:
|
||||
|
||||
- `AGENTIC_FIT_ANALYSIS_ZH.md`、`STRUCTURAL_VALIDATION_REPORT_ZH.md`:ts=10 早期分析。
|
||||
- `KVCACHE_CENTRIC_PROGRESS_ZH.md`:早期项目快照。
|
||||
- `KVC_DEBUG_JOURNEY_V1_TO_V5.md`、`V5_PROFILE_INVESTIGATION_ZH.md`:v1–v5 调优过程笔记。
|
||||
- `REFACTOR_PLAN_ZH.md`:v0 重构计划。
|
||||
- `SWEBENCH_EXPERIMENT_*.md`:早期实验日志。
|
||||
|
||||
---
|
||||
|
||||
## 2. 按角色推荐阅读路径
|
||||
|
||||
### 2.1 我是新接手的 SWE/research agent
|
||||
|
||||
1. 先读本文 §0 的 3 篇。
|
||||
2. 再看 [AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) §3(薄弱点)+ §5(GPU-free 工作清单)。
|
||||
3. 选一个 Milestone 1 子项开始做。`docs/BLOCK_LEVEL_EVICTION_DESIGN_ZH.md` 与 `docs/D_TO_P_SYNC_CONTRACT_ZH.md` 是已经准备好的两条工程主线。
|
||||
|
||||
### 2.2 我是 paper reviewer / 审稿预读
|
||||
|
||||
1. [KVC_ROUTER_ALGORITHM.md](KVC_ROUTER_ALGORITHM.md):算法 + theorem。
|
||||
2. [V2_DEEP_ANALYSIS_ZH.md](V2_DEEP_ANALYSIS_ZH.md):核心实测对比 + 我们自己识别的 limitation。
|
||||
3. [E1_E2_RESULTS_ZH.md](E1_E2_RESULTS_ZH.md):真硬件 + RDMA 上的 ablation(含 E2 的 80% failure forensic,证明我们能解释失败)。
|
||||
4. [AUDIT_AND_ROADMAP_ZH.md](AUDIT_AND_ROADMAP_ZH.md) §3:我们自己列出的薄弱点与未来工作(不藏问题)。
|
||||
|
||||
### 2.3 我是要复现实验的 student
|
||||
|
||||
1. [H200_DRIVER570_SETUP_ZH.md](H200_DRIVER570_SETUP_ZH.md)。
|
||||
2. [EVALUATION_PROTOCOL_ZH.md](EVALUATION_PROTOCOL_ZH.md):跑哪些 sweep、按什么协议比较。
|
||||
3. `scripts/sweep_ts1_migration_v2.sh`:v2 主 sweep;`scripts/sweep_e1_naive_1p3d.sh` / `scripts/sweep_e2_kvc_v2_rdma.sh`:E1/E2 ablation。
|
||||
|
||||
### 2.4 我想看 control plane 与 admission
|
||||
|
||||
1. `src/agentic_pd_hybrid/policies.py`:`KvAwarePolicy.select` 是 Algorithm 1 的实现。
|
||||
2. `src/agentic_pd_hybrid/replay.py`:`_invoke_session_direct` / `_invoke_kvcache_seeded_router` 是 Algorithm 3 的 orchestration。
|
||||
3. `third_party/sglang/python/sglang/srt/managers/scheduler.py`:D 端 `_admit_direct_append` 是 Algorithm 2 实现。
|
||||
|
||||
---
|
||||
|
||||
## 3. 这份索引的维护约定
|
||||
|
||||
- 新加一份 design / experiment doc 必须在本文 §1 表格里加一行。
|
||||
- 文档归档(移到 `docs/archive/`)时本文同步删除条目或标 "已归档"。
|
||||
- 本文不写实质内容,只做导航;任何深入说明都在被指向的文档里。
|
||||
Reference in New Issue
Block a user