Speed up 27B TP A/B: request_timeout 180s, search.high 0.125

The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes,
and the 900s request timeout made overloaded probes drain hung requests for 15min
each. Cap drain at 180s and bound the search to where the boundaries actually are.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 22:40:42 +08:00
parent 7678c7d5e8
commit 3541065675

View File

@@ -20,7 +20,7 @@
"port": 18082,
"healthcheck_path": "/v1/models",
"ready_timeout_s": 900,
"request_timeout_s": 900,
"request_timeout_s": 180,
"launch_args": [
"serve",
"/home/admin/resource/model/464482ce/qwen3.5-27b/256k-0223-internal"
@@ -156,9 +156,9 @@
},
"search": {
"low": 0.0,
"high": 0.5,
"high": 0.125,
"tolerance": 0.001,
"max_probes": 7,
"max_probes": 6,
"sample_seed": 20260325
},
"llm": {