246ae1c59045f500550a0af53143fab8805cfc2f
Qwen3 model (qwen3.rs): - RMSNorm + QK normalization (per-head q_norm/k_norm) - GQA: 32 Q heads, 8 KV heads, repeat_kv for attention - SwiGLU FFN: gate_proj → SiLU → * up_proj → down_proj - RoPE with transpose for [1,H,S,D] ↔ [S,H,D] layout - BF16 forward pass, [out,in] weight layout via linear_t - No attention bias (attention_bias=false) Tokenizer fixes: - Fixed unicode_to_byte: shifted bytes now use correct inverse lookup table - MergeEntry supports both string and array formats - Both GPT-2 and Qwen3 tokenizers work correctly (English + Chinese) KVCache refactored: - Dtype-agnostic: stores raw bytes per-head, works for F32 and BF16 - append_kv_tensor/get_kv_tensors use Tensor directly CLI updated: - Auto-detects model type from config.json (gpt2 vs qwen3) - Supports both GPT-2 (F32) and Qwen3 (BF16) Verified: Qwen3-8B generates coherent English and Chinese on single RTX 5090. 61/61 tests pass, GPT-2 performance no regression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Rust
67.5%
Python
15.1%
Cuda
13.5%
Shell
3.9%