Files
xserv/docs
Gahow Wang 80157e614a docs: update llama.cpp comparison with 8192 results (OOM fixed)
Re-ran the full comparison at --max-seq-len 8192 now that xserv handles it:
- OOM finding resolved — pool sized to available VRAM + vLLM-style host swap;
  8192 runs with 0 swap events (swap is the overload safety net).
- Quality at parity with equal context: AIME 20.0% vs 20.0%, GSM8K 98% vs 96%.
- Speed unchanged relative to llama.cpp (~0.42-0.60x); TPOT is bandwidth-bound.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 21:32:14 +08:00
..