|
|
cb12250ef0
|
phase 8: add benchmark framework + baseline results
- bench-gpt2 binary: runs 50 prompts, measures TTFT/TBT per prompt, outputs JSON
- bench_compare.py: compares xserv vs transformers token-by-token + timing
- Baseline results: 50/50 correctness, 400ms TTFT / 407ms TBT (100x slower than PyTorch)
- Bottlenecks documented: no KV cache, CPU round-trips, cuBLAS handle churn
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-05-21 23:29:41 +08:00 |
|