xserv/docs/06-model-loading.md

# Phase 6: Model Loading — Design Document

## Goal

从 HuggingFace safetensors 文件加载模型权重到 GPU Tensor。解析 config.json 获取模型结构参数。

## Crate: `xserv-model`

```
crates/xserv-model/src/
├── lib.rs
├── config.rs       # ModelConfig from config.json
├── loader.rs       # safetensors weight loading
└── gpt2.rs         # (Phase 8) GPT-2 model definition
```

## Dependencies

- `safetensors` crate: parse safetensors format
- `serde` + `serde_json`: deserialize config.json
- `memmap2`: mmap for zero-copy file access (safetensors uses this internally)

## Weight Loading Flow

```
safetensors file (disk)
  → safetensors crate parses header (tensor names, shapes, dtypes, offsets)
  → mmap raw data
  → for each tensor:
      → read bytes at offset
      → create CPU Tensor from raw bytes
      → .to_device(Cuda(0)) → GPU Tensor
  → return HashMap<String, Tensor>
```

## Config Parsing

```rust
#[derive(Deserialize)]
pub struct ModelConfig {
    pub architectures: Option<Vec<String>>,
    pub model_type: Option<String>,
    pub hidden_size: usize,
    pub intermediate_size: Option<usize>,
    pub num_attention_heads: usize,
    pub num_key_value_heads: Option<usize>,
    pub num_hidden_layers: usize,
    pub vocab_size: usize,
    pub max_position_embeddings: Option<usize>,
    pub layer_norm_eps: Option<f64>,
    pub rms_norm_eps: Option<f64>,
    pub rope_theta: Option<f64>,
    pub tie_word_embeddings: Option<bool>,
}
```

## Test Plan

- [x] Load GPT-2 124M: 160 tensors loaded successfully
- [x] Parse GPT-2 config.json: hidden=768, layers=12, heads=12, vocab=50257
- [x] Sharded loading path implemented (for larger models)

## Takeaways

1. **GPT-2 vs modern HF config naming**：GPT-2 uses `n_embd`/`n_head`/`n_layer`/`n_positions`，而不是 `hidden_size`/`num_attention_heads` 等。ModelConfig 需要支持两套命名并提供统一的 accessor methods（`hidden()`, `num_heads()` 等）。

2. **safetensors 零拷贝读取**：`safetensors` crate 直接 mmap 文件，解析 header 得到 tensor 的 offset 和 shape，然后 zero-copy 读取 raw bytes。对于 GPT-2 的 500MB 权重文件，加载速度很快。

3. **模型下载的网络问题**：HuggingFace 在中国网络下不可达。使用 modelscope.cn 或 hf-mirror.com 作为替代。大文件（>100MB）的 redirect 到 CDN 可能也会失败，modelscope 的 snapshot_download 更可靠。