xserv/tools at 824cc58daa52c75b301e64beb6d0726fbd948827 - xserv - Local Gitea

gahow/xserv

Files

History

Gahow Wang a4a171d425 bench: TP sweep harness (xserv --tp, llama row-split, concurrent groups)

runner/servers gain --tp (xserv --tp N; llama.cpp --split-mode row) and
--llama-devices so llama can run on a disjoint GPU group. run_tp_parallel.sh
runs xserv (GPU 0..N-1) and llama.cpp (GPU 4..4+N-1) concurrently per TP,
matching the box's 0-3 / 4-7 PHB groups. summarize_tp.py tabulates the sweep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-29 11:10:43 +08:00

..

bench: TP sweep harness (xserv --tp, llama row-split, concurrent groups)

2026-05-29 11:10:43 +08:00

__init__.py

bench: run one server at a time, match thinking mode, fix tools package

2026-05-28 11:40:07 +08:00

analyze_divergence.py

phase 9: KV cache + autoregressive generation

2026-05-21 23:39:41 +08:00

bench_compare_qwen3.py

phase 10: add Qwen3-8B benchmark + performance fix

2026-05-22 10:25:33 +08:00

bench_compare.py

phase 8: add benchmark framework + baseline results

2026-05-21 23:29:41 +08:00

bench_server.py

tools: add correctness + performance test scripts for Qwen3-8B

2026-05-23 14:13:49 +08:00

bench_vs_hf.py

fix: comprehensive review + 14 bug fixes + Phase 12/14 overhaul

2026-05-22 17:53:28 +08:00

compare_logits.py

fix: comprehensive review + 14 bug fixes + Phase 12/14 overhaul

2026-05-22 17:53:28 +08:00

convert-to-gguf.sh

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

e2e_validate.py

fix: comprehensive review + 14 bug fixes + Phase 12/14 overhaul

2026-05-22 17:53:28 +08:00

setup-llama-cpp.sh

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

sync-and-build.sh

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

test_concurrent.py

fix: comprehensive review + 14 bug fixes + Phase 12/14 overhaul

2026-05-22 17:53:28 +08:00

test_correctness.py

tools: add correctness + performance test scripts for Qwen3-8B

2026-05-23 14:13:49 +08:00