xserv

Files

Gahow Wang cb12250ef0 phase 8: add benchmark framework + baseline results

- bench-gpt2 binary: runs 50 prompts, measures TTFT/TBT per prompt, outputs JSON
- bench_compare.py: compares xserv vs transformers token-by-token + timing
- Baseline results: 50/50 correctness, 400ms TTFT / 407ms TBT (100x slower than PyTorch)
- Bottlenecks documented: no KV cache, CPU round-trips, cuBLAS handle churn

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-21 23:29:41 +08:00

phase8-gpt2-baseline.md

phase 8: add benchmark framework + baseline results

2026-05-21 23:29:41 +08:00