xserv

Go to file

Gahow Wang e1e75fc7f6 phase 6+7+8: model loading, BPE tokenizer, GPT-2 inference (Milestone ①)

Phase 6 — Model Loading (xserv-model):
- safetensors parser with single/sharded file support
- ModelConfig with dual naming (GPT-2 n_embd/n_head + modern HF naming)
- Weight loading flow: safetensors → mmap → CPU Tensor → GPU

Phase 7 — BPE Tokenizer (xserv-tokenizer):
- Full BPE encode/decode from tokenizer.json
- GPT-2 byte-to-unicode mapping (printable ASCII identity + shifted bytes)
- Pre-tokenization regex, special token handling
- Chat template support structure

Phase 8 — GPT-2 Complete Inference:
- GPT-2 model definition: wte, wpe, 12 transformer blocks, ln_f
- Forward pass: embedding → (LayerNorm → MHA → residual → LayerNorm → MLP → residual) × 12 → LN → logits
- QKV split with correct [batch, heads, seq, dim] layout (fixed reshape bug)
- Greedy sampling from last-position logits
- Interactive CLI: xserv-cli <model-dir> [--max-tokens N]

Verified: GPT-2 124M generates coherent English text on RTX 5090.
"The future of AI is uncertain. The future of AI is uncertain..."
"Once upon a time, the world was a place of great beauty..."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-21 22:04:00 +08:00

crates

phase 6+7+8: model loading, BPE tokenizer, GPT-2 inference (Milestone ①)

2026-05-21 22:04:00 +08:00

csrc

phase 5: naive multi-head attention

2026-05-21 21:17:23 +08:00

docs

phase 6+7+8: model loading, BPE tokenizer, GPT-2 inference (Milestone ①)

2026-05-21 22:04:00 +08:00

tools

phase 0+1: project scaffold + xserv-cuda crate