Record what the new baseline adds (llama.cpp pinned b9371, same BF16 weights, AIME 2025 + GSM8K) and the measured results: performance (xserv ~0.45-0.61x llama.cpp throughput) and quality parity (GSM8K 94% vs 96%, AIME 23.3% vs 20% after the context fix), plus the findings the bench surfaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
23 lines
382 B
Plaintext
23 lines
382 B
Plaintext
/target
|
|
*.o
|
|
*.so
|
|
*.a
|
|
*.ptx
|
|
*.cubin
|
|
**/*.rs.bk
|
|
.env
|
|
*.npy
|
|
|
|
# llama.cpp baseline (cloned/submoduled by tools/setup-llama-cpp.sh)
|
|
/third_party/llama.cpp/build/
|
|
/third_party/llama.cpp/models/
|
|
*.gguf
|
|
|
|
# Benchmark output + fetched datasets (transferred to GPU host, not committed)
|
|
/bench-out/
|
|
/tools/bench/data/
|
|
/tools/__pycache__/
|
|
/tools/bench/__pycache__/
|
|
/tools/bench/**/__pycache__/
|
|
|