Files
xserv/docs
Gahow Wang 24c49c31c2 tools: warm-server FP8 vs BF16 benchmark + results doc
fp8_compare.py launches one xserv-server per model (same GPUs / TP for a
fair comparison), gates readiness on a real generation (not /health),
and streams GSM8K through /v1/chat/completions measuring per-request
TTFT (time to first token) and TPOT (mean inter-token latency) plus
exact-match accuracy. docs/benchmarks/fp8-quantization.md records the
quantization scheme, the perf-bug fix, and the dash5 results.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 00:58:46 +08:00
..