xserv

Go to file

Gahow Wang cb12250ef0 phase 8: add benchmark framework + baseline results

- bench-gpt2 binary: runs 50 prompts, measures TTFT/TBT per prompt, outputs JSON
- bench_compare.py: compares xserv vs transformers token-by-token + timing
- Baseline results: 50/50 correctness, 400ms TTFT / 407ms TBT (100x slower than PyTorch)
- Bottlenecks documented: no KV cache, CPU round-trips, cuBLAS handle churn

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-21 23:29:41 +08:00

crates

phase 8: add benchmark framework + baseline results

2026-05-21 23:29:41 +08:00

csrc

phase 5: naive multi-head attention

2026-05-21 21:17:23 +08:00

docs

phase 8: add benchmark framework + baseline results

2026-05-21 23:29:41 +08:00

tools

phase 8: add benchmark framework + baseline results