Go to file

kzlin 74194e660a docs: v4 final results, error analysis, and updated journey

Add v4 sweep results and post-mortem analysis showing:

- direct-to-D path: 54.3% (1P7D) / 58.0% (2P6D) of requests now use
  KVC cleanly. P50=0.5s and TTFT P50=0.043s; this path beats baseline
  8DP across the board (P50 -24%, TTFT P50 -54%, TTFT P90 -79%).

- Overall vs baseline (errors+truncated excluded):
  v4 2P6D P50=0.85s vs baseline 0.66s (28% slower).
  Reason is not errors -- 35% of requests still hit
  fallback-large-append-session-cap, where capacity-based
  cap = usable_tokens / target_tokens evaluates to 1-2 (not 16)
  for large agentic inputs.

- 9-10% errors on KVC variants are mooncake TCP transfer timeouts,
  not SGLang logic bugs. Prefill log shows
  "Failed to send kv chunk ... 32s timeout ... session not alive".
  Errors concentrate in turn>=31 (large inputs) after run >44.8%.

Track:
- docs/KVC_DEBUG_JOURNEY_V1_TO_V4.md: append v4 results table,
  per-mode breakdown, and error root cause.
- scripts/analysis/{analyze_v3,analyze_v4,analyze_errors,compare_no_error}.py
- outputs/qwen3-30b-tp1-v{3,4}*/exp*_summary.json (force-added,
  small JSON; metrics.jsonl excluded due to size).
- outputs/qwen3-30b-tp1-v{3,4}*/sweep_results.txt

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 23:34:01 +08:00

docs

docs: v4 final results, error analysis, and updated journey

2026-04-28 23:34:01 +08:00

outputs

docs: v4 final results, error analysis, and updated journey

2026-04-28 23:34:01 +08:00

scripts

docs: v4 final results, error analysis, and updated journey

2026-04-28 23:34:01 +08:00

src/agentic_pd_hybrid

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

third_party/sglang

docs: KVC v1-v4 debug journey + raise session soft_cap to 16

2026-04-28 21:10:41 +08:00

.gitignore

chore: vendor sglang v0.5.10 snapshot

2026-04-24 12:29:36 +00:00

.python-version

chore: initialize repo hygiene

2026-04-24 12:17:40 +00:00

AGENTS.md

docs: document project design and status

2026-04-24 12:17:55 +00:00

pyproject.toml

feat: add agentic pd hybrid benchmark prototype

2026-04-24 12:17:46 +00:00

README.md

docs: rewrite project docs in concise chinese

2026-04-24 12:41:52 +00:00

uv.lock

feat: add agentic pd hybrid benchmark prototype

2026-04-24 12:17:46 +00:00

README.md

Agentic PD Hybrid

这个项目是在 SGLang xPyD 上做一个最小实验框架，用来判断：

面向 agentic coding workload 的 session-aware / KV-cache-aware P/D routing，能不能降低端到端延迟。

更完整但仍然简洁的说明见 docs/PROJECT_OVERVIEW.md。

当前做了什么

启动单机 SGLang P/D 栈。
回放 Ali coding agent trace，并记录 request-level metrics。
支持 default、sticky、kv-aware 路由策略。
支持 pd-disaggregation、kvcache-centric、pd-colo 对比。
支持小 append、多轮 session 的 micro-benchmark trace。
维护了基于 SGLang v0.5.10 的本地 patch，放在 third_party/sglang。

环境

统一使用 uv：

uv sync

默认模型路径：

~/models/Qwen/Qwen3-Coder-30B-A3B-Instruct

当前主要测试环境是单机 8 GPU，约束是 prefill + decode <= 8。

常用命令

生成小 append trace：

uv run agentic-pd-hybrid make-small-append-trace \
  --output outputs/smoke-hotcap-30k-1k-256.jsonl \
  --session-count 4 \
  --turns-per-session 3 \
  --initial-input-length 30000 \
  --append-input-length 1000 \
  --output-length 256

跑 live benchmark：

uv run agentic-pd-hybrid benchmark-live \
  --trace outputs/micro-serveable-varturn-30k-1k-256-20260424T0756Z.jsonl \
  --output-root outputs/live-serveable-varturn-30k-1k-256-hotcap \
  --mechanism kvcache-centric \
  --policy kv-aware \
  --kvcache-admission-mode worker \
  --prefill-workers 1 \
  --decode-workers 1 \
  --prefill-gpu-ids 0 \
  --decode-gpu-ids 1 \
  --transfer-backend mooncake \
  --target-duration-s 2000 \
  --session-sample-rate 1.0 \
  --min-turns 2 \
  --time-scale 1 \
  --concurrency-limit 1000

只回放并写 metrics：

uv run agentic-pd-hybrid replay \
  --trace path/to/trace.jsonl \
  --policy kv-aware \
  --mechanism pd-disaggregation \
  --router-url http://127.0.0.1:8000 \
  --output outputs/replay.jsonl

输出

每次 replay/benchmark 会写：

request metrics：request-metrics.jsonl
汇总结果：request-metrics.jsonl.summary.json

重点看：

E2E latency
TTFT / TPOT
execution mode
cached tokens
KV transfer blocks
error

维护约定

项目代码改动：feat: / fix: / docs:。
SGLang 改动：feat(sglang): ... / fix(sglang): ...。
third_party/sglang 的基线是 clean SGLang v0.5.10 snapshot。
不提交 outputs/、日志、__pycache__、虚拟环境。

README.md Unescape Escape

Agentic PD Hybrid

当前做了什么

环境

常用命令

输出

维护约定

README.md