Make harness verification portable
This commit is contained in:
@@ -48,12 +48,23 @@ Improve AITuner convergence for the `dash0` internal vLLM + Qwen3.5-27B 0-8k cha
|
|||||||
|
|
||||||
## Remote Experiment Log
|
## Remote Experiment Log
|
||||||
|
|
||||||
Pending. Next steps:
|
### 2026-04-25 16:30-16:45 CST
|
||||||
|
|
||||||
1. Commit and push the harness implementation.
|
- Pushed commit `2c5e9af` to `origin/main` and pulled it on `dash0`.
|
||||||
2. Pull on `dash0` in `/home/admin/cpfs/wjh/aituner/aituner`.
|
- Remote prompt check command:
|
||||||
3. Start a real harness-guided Qwen3.5-27B 0-8k chat tuning run from `configs/examples/dash0_qwen27b_tight_slo_run4_0_8k.json`.
|
- `PYTHONPATH=src python3 -m aituner.cli study prompt --study-root /tmp/aituner-harness-prompt-check/dash0-qwen27b-tight-slo-10min-run4-chat-0-8k --store-root /tmp/aituner-harness-prompt-check --prompt-name harness-check`
|
||||||
4. Compare the first few iterations against the prior 12-iteration behavior:
|
- Harness profile for `chat_w20260311_1000`, after applying the 0-8k filter:
|
||||||
|
- L: p50 1992, p95 7628, p99 8102, tail ratio 3.83, regime `moderate_tail_prefill_sensitive`.
|
||||||
|
- C: repeated token ratio estimate 0.191, repeated block ratio 0.189, multi-turn ratio 0.160, regime `low_prefix_reuse`.
|
||||||
|
- A: request rate 29.52 req/s, p95 1s QPS 40, burst ratio 1.36, regime `smooth`.
|
||||||
|
- Active harnesses: `tensor-parallel-size` and `max-num-batched-tokens`, which matches a TTFT/prefill-sensitive 0-8k chat workload.
|
||||||
|
- Remote `compileall` passed.
|
||||||
|
- Remote `unittest discover` initially exposed two pre-existing path-sensitive tests that hardcoded `/home/gahow/phd/aituner`; fixed them to derive `REPO_ROOT` from the test file path.
|
||||||
|
|
||||||
|
Remaining next steps:
|
||||||
|
|
||||||
|
1. Start a real harness-guided Qwen3.5-27B 0-8k chat tuning run from `configs/examples/dash0_qwen27b_tight_slo_run4_0_8k.json`.
|
||||||
|
2. Compare the first few iterations against the prior 12-iteration behavior:
|
||||||
- best request rate per GPU should improve or reach the known good region in fewer trials;
|
- best request rate per GPU should improve or reach the known good region in fewer trials;
|
||||||
- proposals should follow the active bottleneck harness;
|
- proposals should follow the active bottleneck harness;
|
||||||
- if the incumbent has converged, the LLM should emit `should_stop=true` instead of proposing a weak exploratory config.
|
- if the incumbent has converged, the LLM should emit `should_stop=true` instead of proposing a weak exploratory config.
|
||||||
|
|||||||
@@ -36,6 +36,9 @@ from aituner.worker import (
|
|||||||
from aituner.trace import TraceRequest
|
from aituner.trace import TraceRequest
|
||||||
|
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).resolve().parents[1]
|
||||||
|
|
||||||
|
|
||||||
def _write_study_assets(
|
def _write_study_assets(
|
||||||
tmp_path: Path,
|
tmp_path: Path,
|
||||||
*,
|
*,
|
||||||
@@ -842,7 +845,7 @@ class CoreFlowTests(unittest.TestCase):
|
|||||||
"--overwrite",
|
"--overwrite",
|
||||||
],
|
],
|
||||||
check=True,
|
check=True,
|
||||||
cwd="/home/gahow/phd/aituner",
|
cwd=str(REPO_ROOT),
|
||||||
)
|
)
|
||||||
|
|
||||||
windows_payload = json.loads((output_root / "windows.json").read_text(encoding="utf-8"))
|
windows_payload = json.loads((output_root / "windows.json").read_text(encoding="utf-8"))
|
||||||
@@ -907,7 +910,7 @@ class CoreFlowTests(unittest.TestCase):
|
|||||||
"chat",
|
"chat",
|
||||||
"--overwrite",
|
"--overwrite",
|
||||||
],
|
],
|
||||||
cwd="/home/gahow/phd/aituner",
|
cwd=str(REPO_ROOT),
|
||||||
capture_output=True,
|
capture_output=True,
|
||||||
text=True,
|
text=True,
|
||||||
)
|
)
|
||||||
|
|||||||
Reference in New Issue
Block a user