fp8_compare.py launches one xserv-server per model (same GPUs / TP for a
fair comparison), gates readiness on a real generation (not /health),
and streams GSM8K through /v1/chat/completions measuring per-request
TTFT (time to first token) and TPOT (mean inter-token latency) plus
exact-match accuracy. docs/benchmarks/fp8-quantization.md records the
quantization scheme, the perf-bug fix, and the dash5 results.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>