Gahow Wang
d5dcf1a5ab
bench: PP harness (xserv --pp vs llama.cpp -sm layer)
runner/servers: add --pp for both engines (xserv --pp N; llama.cpp
-sm layer over N GPUs). New drivers: pp_final.sh (sequential latency +
per-GPU VRAM + byte-exact correctness), pp_diag.sh (single x2 vs pp4 x2
determinism control), pp_quality_full.sh / pp_llama_47.sh (AIME+GSM8K
matrix, xserv on 0-3 || llama on 4-7), summarize_pp/summarize_fullq,
pp_time.py latency probe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 18:45:59 +08:00
..
2026-05-29 18:45:59 +08:00
2026-05-28 11:40:07 +08:00
2026-05-21 23:39:41 +08:00
2026-05-22 10:25:33 +08:00
2026-05-21 23:29:41 +08:00
2026-05-23 14:13:49 +08:00
2026-05-22 17:53:28 +08:00
2026-05-22 17:53:28 +08:00
2026-05-28 11:18:52 +08:00
2026-05-22 17:53:28 +08:00
2026-05-29 18:45:59 +08:00
2026-05-29 18:45:59 +08:00
2026-05-29 18:45:59 +08:00
2026-05-29 18:45:59 +08:00
2026-05-29 18:45:59 +08:00
2026-05-28 11:18:52 +08:00
2026-05-28 11:18:52 +08:00
2026-05-22 17:53:28 +08:00
2026-05-23 14:13:49 +08:00