xserv/bench at main - xserv - Local Gitea

gahow/xserv

Files

History

Gahow Wang d5dcf1a5ab bench: PP harness (xserv --pp vs llama.cpp -sm layer)

runner/servers: add --pp for both engines (xserv --pp N; llama.cpp
-sm layer over N GPUs). New drivers: pp_final.sh (sequential latency +
per-GPU VRAM + byte-exact correctness), pp_diag.sh (single x2 vs pp4 x2
determinism control), pp_quality_full.sh / pp_llama_47.sh (AIME+GSM8K
matrix, xserv on 0-3 || llama on 4-7), summarize_pp/summarize_fullq,
pp_time.py latency probe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-29 18:45:59 +08:00

..

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

__init__.py

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

client.py

bench: run one server at a time, match thinking mode, fix tools package

2026-05-28 11:40:07 +08:00

config.py

bench: run one server at a time, match thinking mode, fix tools package

2026-05-28 11:40:07 +08:00

fetch_datasets.py

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

pp_clean_bench.sh

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

pp_time.py

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

quality.py

bench: run one server at a time, match thinking mode, fix tools package

2026-05-28 11:40:07 +08:00

report.py

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

requirements.txt

tools: add llama.cpp comparison baseline + standard benchmark suite

2026-05-28 11:18:52 +08:00

run_pp_parallel.sh

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

run_tp_parallel.sh

bench: TP sweep harness (xserv --tp, llama row-split, concurrent groups)

2026-05-29 11:10:43 +08:00

runner.py

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

servers.py

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

speed.py

bench: run one server at a time, match thinking mode, fix tools package

2026-05-28 11:40:07 +08:00

summarize_fullq.py

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

summarize_pp.py

bench: PP harness (xserv --pp vs llama.cpp -sm layer)

2026-05-29 18:45:59 +08:00

summarize_tp.py

bench: TP sweep harness (xserv --tp, llama row-split, concurrent groups)

2026-05-29 11:10:43 +08:00