Make harness stop conservative for ablation
This commit is contained in:
@@ -38,14 +38,14 @@ The experiment reuses the 0-8k chat window that has already been used for qwen27
|
||||
| window | `chat_w20260311_1000` |
|
||||
| source rows | 32606 |
|
||||
| input filter | 0 to 8192 tokens |
|
||||
| max requests per probe | 2048 |
|
||||
| max requests per probe | 512 |
|
||||
| target pass rate | 0.95 |
|
||||
| TTFT SLO | 2s up to 4k, 4s up to 32k, 6s above |
|
||||
| TPOT SLO | 50ms |
|
||||
| search high | 0.125 sampling_u |
|
||||
| max probes per trial | 6 |
|
||||
|
||||
The `max_requests_per_probe=2048` cap keeps the fresh community-vLLM ablation practical while preserving a real trace-shaped replay, SLO scoring, and binary-search threshold probe.
|
||||
The `max_requests_per_probe=512` cap keeps the fresh community-vLLM ablation practical while preserving a real trace-shaped replay, SLO scoring, and binary-search threshold probe. A trace-only count check gives 31 to 65 selected requests across the six binary-search thresholds, avoiding the invalid low-cap case where early thresholds can select zero requests.
|
||||
|
||||
## Harness Update Under Test
|
||||
|
||||
@@ -59,6 +59,7 @@ This run tests a stricter early-stop harness:
|
||||
- those validation trials did not produce a feasible per-GPU improvement,
|
||||
- the validation covered topology and runtime families, or accumulated at least three post-incumbent validation attempts.
|
||||
- If the stop guard fires, `study tune` writes `harness-stop-XXXX` and exits without spending another GPU trial or asking the LLM for another proposal.
|
||||
- A single-family all-infeasible plateau is not enough to stop deterministically. It only blocks repeating that family; the LLM must either justify a different family or later satisfy the validation/convergence stop rule.
|
||||
|
||||
This is a generic harness rule, not a testcase-specific threshold. It does not depend on qwen27b, qwen235b, qwen30b, a fixed TP/DP value, or a hardcoded SLO number.
|
||||
|
||||
@@ -87,8 +88,8 @@ Pending dash0 runs:
|
||||
|
||||
| Variant | tmux session | Log | Study root |
|
||||
| --- | --- | --- | --- |
|
||||
| no-harness | `qwen30b_vllm020_noharness_20260502` | `logs/qwen30b_vllm020_noharness_20260502.log` | `.aituner-community-vllm020/studies/dash0-qwen30b-a3b-community-vllm020-chat-0-8k-noharness` |
|
||||
| harness | `qwen30b_vllm020_harness_20260502` | `logs/qwen30b_vllm020_harness_20260502.log` | `.aituner-community-vllm020/studies/dash0-qwen30b-a3b-community-vllm020-chat-0-8k-harness` |
|
||||
| no-harness | `qwen30b_vllm020_noharness_probe512_20260502` | `logs/qwen30b_vllm020_noharness_probe512_20260502.log` | `.aituner-community-vllm020/dash0-qwen30b-a3b-community-vllm020-chat-0-8k-probe512-noharness` |
|
||||
| harness | `qwen30b_vllm020_harness_probe512_20260502` | `logs/qwen30b_vllm020_harness_probe512_20260502.log` | `.aituner-community-vllm020/dash0-qwen30b-a3b-community-vllm020-chat-0-8k-probe512-harness` |
|
||||
|
||||
The harness run should be judged by best-so-far `request_rate_per_gpu` per tuning iteration, plus whether it stops only after validation evidence. The no-harness run should use the same trial budget so the ablation exposes whether the early-stop harness saves iterations without hiding a later better point.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user