Gahow Wang
11e0154e4d
docs: Phase 18 pipeline parallelism — design + benchmark results
docs/18-pipeline-parallelism.md: PP design (layer split, NCCL P2P,
per-stage KV, engine/threading model).
docs/benchmarks/pp-sweep.md: measured on dash5 (8x RTX 5090, Qwen3-8B
BF16) — single-stream latency + per-GPU VRAM (~1/N), byte-exact
correctness (single x2 vs pp4 x2 control), and the full AIME-30 +
GSM8K-30 quality matrix (xserv & llama.cpp PP=1/2/4): GSM8K 29/30 in
every cell, TPOT flat across PP.
README: multi-card (TP/PP) section + roadmap to Phase 18.
gitignore: /.claude/ runtime state.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 18:57:09 +08:00
..
2026-05-29 18:57:09 +08:00
2026-05-22 17:53:28 +08:00
2026-05-21 20:59:45 +08:00
2026-05-21 20:59:45 +08:00
2026-05-21 20:59:45 +08:00
2026-05-21 21:07:24 +08:00
2026-05-21 21:17:23 +08:00
2026-05-21 22:04:00 +08:00
2026-05-21 22:04:00 +08:00
2026-05-21 22:04:00 +08:00
2026-05-21 23:39:41 +08:00
2026-05-22 17:53:28 +08:00
2026-05-22 18:51:29 +08:00
2026-05-22 18:51:29 +08:00
2026-05-22 13:15:27 +08:00
2026-05-22 18:51:29 +08:00
2026-05-23 00:39:27 +08:00
2026-05-28 21:32:14 +08:00
2026-05-29 11:10:03 +08:00
2026-05-29 18:57:09 +08:00
2026-05-23 14:13:43 +08:00