# Phase 6: Model Loading — Design Document ## Goal 从 HuggingFace safetensors 文件加载模型权重到 GPU Tensor。解析 config.json 获取模型结构参数。 ## Crate: `xserv-model` ``` crates/xserv-model/src/ ├── lib.rs ├── config.rs # ModelConfig from config.json ├── loader.rs # safetensors weight loading └── gpt2.rs # (Phase 8) GPT-2 model definition ``` ## Dependencies - `safetensors` crate: parse safetensors format - `serde` + `serde_json`: deserialize config.json - `memmap2`: mmap for zero-copy file access (safetensors uses this internally) ## Weight Loading Flow ``` safetensors file (disk) → safetensors crate parses header (tensor names, shapes, dtypes, offsets) → mmap raw data → for each tensor: → read bytes at offset → create CPU Tensor from raw bytes → .to_device(Cuda(0)) → GPU Tensor → return HashMap ``` ## Config Parsing ```rust #[derive(Deserialize)] pub struct ModelConfig { pub architectures: Option>, pub model_type: Option, pub hidden_size: usize, pub intermediate_size: Option, pub num_attention_heads: usize, pub num_key_value_heads: Option, pub num_hidden_layers: usize, pub vocab_size: usize, pub max_position_embeddings: Option, pub layer_norm_eps: Option, pub rms_norm_eps: Option, pub rope_theta: Option, pub tie_word_embeddings: Option, } ``` ## Test Plan - [x] Load GPT-2 124M: 160 tensors loaded successfully - [x] Parse GPT-2 config.json: hidden=768, layers=12, heads=12, vocab=50257 - [x] Sharded loading path implemented (for larger models) ## Takeaways 1. **GPT-2 vs modern HF config naming**:GPT-2 uses `n_embd`/`n_head`/`n_layer`/`n_positions`,而不是 `hidden_size`/`num_attention_heads` 等。ModelConfig 需要支持两套命名并提供统一的 accessor methods(`hidden()`, `num_heads()` 等)。 2. **safetensors 零拷贝读取**:`safetensors` crate 直接 mmap 文件,解析 header 得到 tensor 的 offset 和 shape,然后 zero-copy 读取 raw bytes。对于 GPT-2 的 500MB 权重文件,加载速度很快。 3. **模型下载的网络问题**:HuggingFace 在中国网络下不可达。使用 modelscope.cn 或 hf-mirror.com 作为替代。大文件(>100MB)的 redirect 到 CDN 可能也会失败,modelscope 的 snapshot_download 更可靠。