3f1c3d429a6aaa554a692ecef14b817cf3b239fa
Record what the new baseline adds (llama.cpp pinned b9371, same BF16 weights, AIME 2025 + GSM8K) and the measured results: performance (xserv ~0.45-0.61x llama.cpp throughput) and quality parity (GSM8K 94% vs 96%, AIME 23.3% vs 20% after the context fix), plus the findings the bench surfaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Description
No description provided
Languages
Rust
67.5%
Python
15.1%
Cuda
13.5%
Shell
3.9%