Files
xserv/tools
Gahow Wang a4a171d425 bench: TP sweep harness (xserv --tp, llama row-split, concurrent groups)
runner/servers gain --tp (xserv --tp N; llama.cpp --split-mode row) and
--llama-devices so llama can run on a disjoint GPU group. run_tp_parallel.sh
runs xserv (GPU 0..N-1) and llama.cpp (GPU 4..4+N-1) concurrently per TP,
matching the box's 0-3 / 4-7 PHB groups. summarize_tp.py tabulates the sweep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:10:43 +08:00
..