xserv

gahow/xserv

Fork 0

Commit Graph

Author	SHA1	Message	Date
Gahow Wang	76fffb3b68	docs: Phase 17 tensor parallelism design Megatron-style TP for Qwen3 on the 8x5090 (no-NVLink, PCIe) box: column/row split per layer, 2 AllReduces/layer, multi-thread one-rank-per-GPU model, NCCL, sharded weights, and the incremental implementation + verification plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 11:10:03 +08:00

Author

SHA1

Message

Date

Gahow Wang

76fffb3b68

docs: Phase 17 tensor parallelism design

Megatron-style TP for Qwen3 on the 8x5090 (no-NVLink, PCIe) box: column/row
split per layer, 2 AllReduces/layer, multi-thread one-rank-per-GPU model,
NCCL, sharded weights, and the incremental implementation + verification plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-29 11:10:03 +08:00

1 Commits