xserv

Files

Gahow Wang 246ae1c590 phase 10: Qwen3-8B support (Milestone ②)

Qwen3 model (qwen3.rs):
- RMSNorm + QK normalization (per-head q_norm/k_norm)
- GQA: 32 Q heads, 8 KV heads, repeat_kv for attention
- SwiGLU FFN: gate_proj → SiLU → * up_proj → down_proj
- RoPE with transpose for [1,H,S,D] ↔ [S,H,D] layout
- BF16 forward pass, [out,in] weight layout via linear_t
- No attention bias (attention_bias=false)

Tokenizer fixes:
- Fixed unicode_to_byte: shifted bytes now use correct inverse lookup table
- MergeEntry supports both string and array formats
- Both GPT-2 and Qwen3 tokenizers work correctly (English + Chinese)

KVCache refactored:
- Dtype-agnostic: stores raw bytes per-head, works for F32 and BF16
- append_kv_tensor/get_kv_tensors use Tensor directly

CLI updated:
- Auto-detects model type from config.json (gpt2 vs qwen3)
- Supports both GPT-2 (F32) and Qwen3 (BF16)

Verified: Qwen3-8B generates coherent English and Chinese on single RTX 5090.
61/61 tests pass, GPT-2 performance no regression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-22 00:46:37 +08:00

benchmarks

phase 9: KV cache + autoregressive generation

2026-05-21 23:39:41 +08:00

00-roadmap.md

phase 0+1: project scaffold + xserv-cuda crate

2026-05-21 18:40:22 +08:00

01-cuda-ffi.md

docs: add design docs + takeaways for Phase 2 and Phase 3

2026-05-21 20:59:45 +08:00

02-tensor.md

docs: add design docs + takeaways for Phase 2 and Phase 3