xserv

Go to file

Gahow Wang 3f1c3d429a docs: llama.cpp vs xserv benchmark results + summary

Record what the new baseline adds (llama.cpp pinned b9371, same BF16 weights,
AIME 2025 + GSM8K) and the measured results: performance (xserv ~0.45-0.61x
llama.cpp throughput) and quality parity (GSM8K 94% vs 96%, AIME 23.3% vs 20%
after the context fix), plus the findings the bench surfaced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-28 15:06:21 +08:00

crates

fix: 12 bug fixes from comprehensive review — 51 tok/s verified on RTX 5090

2026-05-23 14:13:43 +08:00

csrc

fix: 12 bug fixes from comprehensive review — 51 tok/s verified on RTX 5090

2026-05-23 14:13:43 +08:00

docs

docs: llama.cpp vs xserv benchmark results + summary

2026-05-28 15:06:21 +08:00

third_party

tools: add llama.cpp comparison baseline + standard benchmark suite