Speed up 27B TP A/B: request_timeout 180s, search.high 0.125
The wide 0.5 range made TP1 (low-capacity) waste many infeasible high-theta probes, and the 900s request timeout made overloaded probes drain hung requests for 15min each. Cap drain at 180s and bound the search to where the boundaries actually are. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -20,7 +20,7 @@
|
||||
"port": 18082,
|
||||
"healthcheck_path": "/v1/models",
|
||||
"ready_timeout_s": 900,
|
||||
"request_timeout_s": 900,
|
||||
"request_timeout_s": 180,
|
||||
"launch_args": [
|
||||
"serve",
|
||||
"/home/admin/resource/model/464482ce/qwen3.5-27b/256k-0223-internal"
|
||||
@@ -156,9 +156,9 @@
|
||||
},
|
||||
"search": {
|
||||
"low": 0.0,
|
||||
"high": 0.5,
|
||||
"high": 0.125,
|
||||
"tolerance": 0.001,
|
||||
"max_probes": 7,
|
||||
"max_probes": 6,
|
||||
"sample_seed": 20260325
|
||||
},
|
||||
"llm": {
|
||||
|
||||
Reference in New Issue
Block a user