tools: add llama.cpp comparison baseline + standard benchmark suite
Vendor llama.cpp as a submodule pinned to b9371 and add a one-click benchmark driver that compares xserv against it on identical workloads: - setup-llama-cpp.sh: network-optional CUDA build (SM120); convert-to-gguf.sh converts the same safetensors to BF16 GGUF for an apples-to-apples baseline. - tools/bench/: black-box OpenAI-API driver measuring TTFT/TPOT/throughput (single-stream + concurrent) and response quality on AIME 2025 + GSM8K. - fetch_datasets.py pulls datasets to local JSON (GPU host has no network); task loaders prefer the local JSON. - sync-and-build.sh: `bench` subcommand transfers source + datasets to the GPU host via tar-over-ssh (no rsync there), builds, and runs the suite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
12
.gitignore
vendored
12
.gitignore
vendored
@@ -7,3 +7,15 @@
|
||||
**/*.rs.bk
|
||||
.env
|
||||
*.npy
|
||||
|
||||
# llama.cpp baseline (cloned/submoduled by tools/setup-llama-cpp.sh)
|
||||
/third_party/llama.cpp/build/
|
||||
/third_party/llama.cpp/models/
|
||||
*.gguf
|
||||
|
||||
# Benchmark output + fetched datasets (transferred to GPU host, not committed)
|
||||
/bench-out/
|
||||
/tools/bench/data/
|
||||
/tools/bench/__pycache__/
|
||||
/tools/bench/**/__pycache__/
|
||||
|
||||
|
||||
Reference in New Issue
Block a user