Gahow Wang
a67e724119
docs: Phase 15 design doc + benchmark report
Design document (docs/15-performance.md):
- Roofline analysis: 112 tok/s theoretical at 1.79 TB/s
- Bottleneck quantification: cuBLAS M=1 GEMV at 8% bandwidth → 77% of step time
- Six optimizations with rationale, implementation details, and expected impact
- Ablation table with per-optimization delta measurements
- Remaining 55% roofline gap breakdown with next-step priorities
Benchmark report (docs/benchmarks/phase15-performance.md):
- Full ablation: 12.9 → 50.3 tok/s across 6 optimizations
- Per-prompt detail (8 prompts, 46-51 tok/s range)
- Concurrent throughput analysis (batch=4 vs serial)
- Phase-over-phase tracking from Phase 8 to Phase 15 (2.5 → 50.3 tok/s)
- Correctness verification (9/10 top-1 match, 52/52 API pass)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 00:39:27 +08:00
..
2026-05-23 00:39:27 +08:00
2026-05-22 17:53:28 +08:00
2026-05-21 20:59:45 +08:00
2026-05-21 20:59:45 +08:00
2026-05-21 20:59:45 +08:00
2026-05-21 21:07:24 +08:00
2026-05-21 21:17:23 +08:00
2026-05-21 22:04:00 +08:00
2026-05-21 22:04:00 +08:00
2026-05-21 22:04:00 +08:00
2026-05-21 23:39:41 +08:00
2026-05-22 17:53:28 +08:00
2026-05-22 18:51:29 +08:00
2026-05-22 18:51:29 +08:00
2026-05-22 13:15:27 +08:00
2026-05-22 18:51:29 +08:00
2026-05-23 00:39:27 +08:00
2026-05-22 17:53:28 +08:00