Files
agentic-pd-hybrid/README.md
Gahow Wang 9a81c993ab docs(onboarding): link new audit / design / eval docs from
the root README + AGENTS.md

Without this, the four docs added on this branch
(AUDIT_AND_ROADMAP, INDEX, BLOCK_LEVEL_EVICTION_DESIGN,
D_TO_P_SYNC_CONTRACT, EVALUATION_PROTOCOL) are reachable
only by listing docs/. This wires them into the two entry
points an agent or collaborator hits first.

README.md changes:
  - top-of-page pointer to INDEX_ZH for new collaborators
  - pointer to AUDIT_AND_ROADMAP_ZH for project state
  - "单元测试 (无 GPU)" section: how to run pytest
  - "评测脚本" section: invocations for the two new
    analysis scripts

AGENTS.md changes:
  - top section "For new collaborators / agents" before
    the existing "Environment" block, pointing at INDEX_ZH,
    AUDIT_AND_ROADMAP_ZH, the two ready-to-pick-up design
    docs, and EVALUATION_PROTOCOL_ZH
  - pytest invocation under Environment
2026-05-12 23:58:56 +08:00

130 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Agentic PD Hybrid
这个项目是在 SGLang xPyD 上做一个最小实验框架,用来判断:
**面向 agentic coding workload 的 session-aware / KV-cache-aware P/D routing能不能降低端到端延迟。**
更完整但仍然简洁的说明见 [docs/PROJECT_OVERVIEW.md](docs/PROJECT_OVERVIEW.md)。
新加入的合作者:先看 [docs/INDEX_ZH.md](docs/INDEX_ZH.md),按"我是谁"选 3 篇必读文档。
项目当前进度、薄弱点、路线图总览见 [docs/AUDIT_AND_ROADMAP_ZH.md](docs/AUDIT_AND_ROADMAP_ZH.md)。
## 当前做了什么
- 启动单机 SGLang P/D 栈。
- 回放 Ali coding agent trace并记录 request-level metrics。
- 支持 `default``sticky``kv-aware` 路由策略。
- 支持 `pd-disaggregation``kvcache-centric``pd-colo` 对比。
- 支持小 append、多轮 session 的 micro-benchmark trace。
- 维护了基于 SGLang `v0.5.10` 的本地 patch放在 `third_party/sglang`
## 环境
统一使用 `uv`
```bash
uv sync
```
默认模型路径:
```text
~/models/Qwen/Qwen3-Coder-30B-A3B-Instruct
```
当前主要测试环境是单机 8 GPU约束是 `prefill + decode <= 8`
## 常用命令
生成小 append trace
```bash
uv run agentic-pd-hybrid make-small-append-trace \
--output outputs/smoke-hotcap-30k-1k-256.jsonl \
--session-count 4 \
--turns-per-session 3 \
--initial-input-length 30000 \
--append-input-length 1000 \
--output-length 256
```
跑 live benchmark
```bash
uv run agentic-pd-hybrid benchmark-live \
--trace outputs/micro-serveable-varturn-30k-1k-256-20260424T0756Z.jsonl \
--output-root outputs/live-serveable-varturn-30k-1k-256-hotcap \
--mechanism kvcache-centric \
--policy kv-aware \
--kvcache-admission-mode worker \
--prefill-workers 1 \
--decode-workers 1 \
--prefill-gpu-ids 0 \
--decode-gpu-ids 1 \
--transfer-backend mooncake \
--target-duration-s 2000 \
--session-sample-rate 1.0 \
--min-turns 2 \
--time-scale 1 \
--concurrency-limit 1000
```
只回放并写 metrics
```bash
uv run agentic-pd-hybrid replay \
--trace path/to/trace.jsonl \
--policy kv-aware \
--mechanism pd-disaggregation \
--router-url http://127.0.0.1:8000 \
--output outputs/replay.jsonl
```
## 输出
每次 replay/benchmark 会写:
- request metrics`request-metrics.jsonl`
- 汇总结果:`request-metrics.jsonl.summary.json`
重点看:
- E2E latency
- TTFT / TPOT
- execution mode
- cached tokens
- KV transfer blocks
- error
## 维护约定
- 项目代码改动:`feat:` / `fix:` / `docs:`
- SGLang 改动:`feat(sglang): ...` / `fix(sglang): ...`
- `third_party/sglang` 的基线是 clean SGLang `v0.5.10` snapshot。
- 不提交 `outputs/`、日志、`__pycache__`、虚拟环境。
## 单元测试(无 GPU
算法层policies、Algorithm 1 / Theorem 1有 pure-Python 单测,跑测试不需要 GPU、不需要 SGLang
```bash
uv sync --group test
uv run pytest
```
详见 [tests/README.md](tests/README.md)。
## 评测脚本
按 [docs/EVALUATION_PROTOCOL_ZH.md](docs/EVALUATION_PROTOCOL_ZH.md) 跑数据后:
```bash
# M3: 按 turn_id / input_length / overlap_ratio / append_tokens 分桶
scripts/analysis/stratified.py outputs/<run>/request-metrics.jsonl
# M2: paired-on-same-trial bootstrap 95% CI
scripts/analysis/paired_compare.py \
--baseline outputs/run-dp/request-metrics.jsonl \
--candidate outputs/run-kvc/request-metrics.jsonl
```