Vendor llama.cpp as a submodule pinned to b9371 and add a one-click benchmark driver that compares xserv against it on identical workloads: - setup-llama-cpp.sh: network-optional CUDA build (SM120); convert-to-gguf.sh converts the same safetensors to BF16 GGUF for an apples-to-apples baseline. - tools/bench/: black-box OpenAI-API driver measuring TTFT/TPOT/throughput (single-stream + concurrent) and response quality on AIME 2025 + GSM8K. - fetch_datasets.py pulls datasets to local JSON (GPU host has no network); task loaders prefer the local JSON. - sync-and-build.sh: `bench` subcommand transfers source + datasets to the GPU host via tar-over-ssh (no rsync there), builds, and runs the suite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
56 lines
1.4 KiB
Bash
Executable File
56 lines
1.4 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# Convert a HuggingFace safetensors model dir into a BF16 GGUF for llama.cpp.
|
|
#
|
|
# Why BF16: we run xserv in BF16, so the baseline must run BF16 too. If we
|
|
# compared xserv-BF16 against llama.cpp-Q4_K_M the speed delta would be
|
|
# dominated by quantization, not by our kernels — that's not an apples-to-
|
|
# apples comparison.
|
|
#
|
|
# Usage:
|
|
# tools/convert-to-gguf.sh <hf-model-dir> [out.gguf]
|
|
#
|
|
# Example:
|
|
# tools/convert-to-gguf.sh /opt/wjh/models/qwen3-8b
|
|
# # → /opt/wjh/models/qwen3-8b/qwen3-8b-bf16.gguf
|
|
|
|
set -euo pipefail
|
|
|
|
if [ "$#" -lt 1 ]; then
|
|
echo "Usage: $0 <hf-model-dir> [out.gguf]" >&2
|
|
exit 1
|
|
fi
|
|
|
|
SRC="$(realpath "$1")"
|
|
ROOT_DIR="$(cd "$(dirname "$0")/.." && pwd)"
|
|
CONVERT_PY="$ROOT_DIR/third_party/llama.cpp/convert_hf_to_gguf.py"
|
|
|
|
if [ ! -f "$CONVERT_PY" ]; then
|
|
echo "convert script not found: $CONVERT_PY" >&2
|
|
echo "Run tools/setup-llama-cpp.sh first." >&2
|
|
exit 1
|
|
fi
|
|
|
|
if [ ! -d "$SRC" ]; then
|
|
echo "source model dir not found: $SRC" >&2
|
|
exit 1
|
|
fi
|
|
|
|
if [ "$#" -ge 2 ]; then
|
|
OUT="$2"
|
|
else
|
|
BASENAME="$(basename "$SRC")"
|
|
OUT="$SRC/${BASENAME}-bf16.gguf"
|
|
fi
|
|
|
|
if [ -f "$OUT" ]; then
|
|
echo "==> already exists: $OUT (skipping; remove to force re-convert)"
|
|
echo "$OUT"
|
|
exit 0
|
|
fi
|
|
|
|
echo "==> converting $SRC -> $OUT (BF16)"
|
|
python3 "$CONVERT_PY" "$SRC" --outfile "$OUT" --outtype bf16
|
|
|
|
echo "=== done ==="
|
|
echo "$OUT"
|