phase 12+13: HTTP API server with OpenAI-compatible endpoint (Milestone ③)

New crate: xserv-server
- Engine thread: loads Qwen3-8B, processes requests sequentially
- axum HTTP server: /health, /v1/models, /v1/chat/completions
- tokio::sync::mpsc channel between API and engine threads
- Non-streaming JSON response (streaming SSE to be added later)

API is OpenAI-compatible:
  POST /v1/chat/completions {"messages": [...], "max_tokens": N}
  → {"choices": [{"message": {"content": "..."}}]}

Verified: "Hi" → ", I'm" (3 tokens), model runs correctly via HTTP.

Key learnings:
- std::sync::mpsc::SyncSender is Send but NOT Sync → wrap in Mutex for Arc<AppState>
- MutexGuard must not live across await points (scope carefully)
- axum 0.8 Extension<Arc<T>> requires T: Send + Sync

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 12:55:19 +08:00
parent 2be27d6d94
commit da043554ba
6 changed files with 376 additions and 0 deletions

View File

@@ -6,6 +6,7 @@ members = [
"crates/xserv-kernels",
"crates/xserv-model",
"crates/xserv-tokenizer",
"crates/xserv-server",
]
[workspace.package]
@@ -20,3 +21,6 @@ serde = { version = "1", features = ["derive"] }
serde_json = "1"
safetensors = "0.5"
regex = "1"
tokio = { version = "1", features = ["full"] }
axum = "0.8"
uuid = { version = "1", features = ["v4"] }