xserv

Files

Gahow Wang 7b8b520cda docs: TP=1/2/4 xserv vs llama.cpp benchmark results

AIME 2025 + GSM8K at TP=1/2/4. Quality on par across engines/TP. Opposite
perf scaling: xserv TPOT improves with TP (21->17->15ms) while llama.cpp
row-split regresses over PCIe (10->19->20ms), crossing over so xserv is faster
at TP=4. Includes the clean same-path bench-tp scaling (58/76/86 tok/s).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-29 11:10:52 +08:00

llama-cpp-comparison.md

docs: update llama.cpp comparison with 8192 results (OOM fixed)

2026-05-28 21:32:14 +08:00

phase8-gpt2-baseline.md

phase 8: add benchmark framework + baseline results

2026-05-21 23:29:41 +08:00

phase9-kv-cache.md

phase 9: KV cache + autoregressive generation

2026-05-21 23:39:41 +08:00

phase10-qwen3.md

phase 10: add Qwen3-8B benchmark + performance fix