New crate: xserv-server
- Engine thread: loads Qwen3-8B, processes requests sequentially
- axum HTTP server: /health, /v1/models, /v1/chat/completions
- tokio::sync::mpsc channel between API and engine threads
- Non-streaming JSON response (streaming SSE to be added later)
API is OpenAI-compatible:
POST /v1/chat/completions {"messages": [...], "max_tokens": N}
→ {"choices": [{"message": {"content": "..."}}]}
Verified: "Hi" → ", I'm" (3 tokens), model runs correctly via HTTP.
Key learnings:
- std::sync::mpsc::SyncSender is Send but NOT Sync → wrap in Mutex for Arc<AppState>
- MutexGuard must not live across await points (scope carefully)
- axum 0.8 Extension<Arc<T>> requires T: Send + Sync
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>