Files
xserv/tools/bench/summarize_tp.py
Gahow Wang a4a171d425 bench: TP sweep harness (xserv --tp, llama row-split, concurrent groups)
runner/servers gain --tp (xserv --tp N; llama.cpp --split-mode row) and
--llama-devices so llama can run on a disjoint GPU group. run_tp_parallel.sh
runs xserv (GPU 0..N-1) and llama.cpp (GPU 4..4+N-1) concurrently per TP,
matching the box's 0-3 / 4-7 PHB groups. summarize_tp.py tabulates the sweep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:10:43 +08:00

25 lines
1.0 KiB
Python

"""Summarize the concurrent TP sweep: bench-out/tp{1,2,4}-{xserv,llama}."""
import glob
import json
import os
import sys
base = sys.argv[1] if len(sys.argv) > 1 else "bench-out"
rows = []
for tp in (1, 2, 4):
for sysname in ("xserv", "llama"):
files = sorted(glob.glob(os.path.join(base, f"tp{tp}-{sysname}", "comparison-*.json")))
if not files:
continue
d = json.load(open(files[-1]))
for r in d["quality"]["summary"]:
rows.append((tp, sysname, r["task"], r["n_correct"], r["n_total"],
r["accuracy"] * 100, r["mean_completion_tokens"],
r["mean_ttft_ms"], r["mean_tpot_ms"], r["wall_s"]))
print("%-3s %-7s %-9s %-9s %7s %9s %9s %10s %9s" %
("TP", "engine", "task", "correct", "acc%", "mean_tok", "TTFT_ms", "TPOT_ms", "wall_s"))
for (tp, s, task, nc, nt, acc, tok, ttft, tpot, wall) in rows:
print("%-3d %-7s %-9s %-9s %6.1f%% %9.0f %9.1f %10.2f %9.0f" %
(tp, s, task, f"{nc}/{nt}", acc, tok, ttft, tpot, wall))