phase 12+13: HTTP API server with OpenAI-compatible endpoint (Milestone ③)

New crate: xserv-server - Engine thread: loads Qwen3-8B, processes requests sequentially - axum HTTP server: /health, /v1/models, /v1/chat/completions - tokio::sync::mpsc channel between API and engine threads - Non-streaming JSON response (streaming SSE to be added later) API is OpenAI-compatible: POST /v1/chat/completions {"messages": [...], "max_tokens": N} → {"choices": [{"message": {"content": "..."}}]} Verified: "Hi" → ", I'm" (3 tokens), model runs correctly via HTTP. Key learnings: - std::sync::mpsc::SyncSender is Send but NOT Sync → wrap in Mutex for Arc<AppState> - MutexGuard must not live across await points (scope carefully) - axum 0.8 Extension<Arc<T>> requires T: Send + Sync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 12:55:19 +08:00
parent 2be27d6d94
commit da043554ba
6 changed files with 376 additions and 0 deletions
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -6,6 +6,7 @@ members = [
    "crates/xserv-kernels",
    "crates/xserv-model",
    "crates/xserv-tokenizer",
+    "crates/xserv-server",
 ]

 [workspace.package]
@@ -20,3 +21,6 @@ serde = { version = "1", features = ["derive"] }
 serde_json = "1"
 safetensors = "0.5"
 regex = "1"
+tokio = { version = "1", features = ["full"] }
+axum = "0.8"
+uuid = { version = "1", features = ["v4"] }