Document eight-GPU harness rerun

This commit is contained in:
2026-05-13 09:04:14 +08:00
parent 5c2958e6c1
commit f18765b235

View File

@@ -130,3 +130,26 @@ Fix:
- parse `engine.base_envs.CUDA_VISIBLE_DEVICES`; - parse `engine.base_envs.CUDA_VISIBLE_DEVICES`;
- compute effective GPU count as `min(hardware.gpu_count, visible_device_count)`; - compute effective GPU count as `min(hardware.gpu_count, visible_device_count)`;
- filter topology candidates and adjacent TP frontier candidates by the effective GPU count. - filter topology candidates and adjacent TP frontier candidates by the effective GPU count.
## GPU Visibility Correction
On 2026-05-13 we corrected the intended experiment setup: `CUDA_VISIBLE_DEVICES` should be `0,1,2,3,4,5,6,7`, not the previous `0,1,2,4,5,6,7`.
This invalidates direct comparison between the old `gpu3skip` runs and new 8-GPU runs. The old v2 failure was real under the old visible-device profile, but it was not the intended 8-card H20 setup.
New comparable studies:
| Variant | Study ID | Status |
| --- | --- | --- |
| no-harness baseline | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-noharness-minprompt-gpt54-20260513` | running first |
| harness | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-harness-profileplanner-20260513` | queued to run after baseline |
Both specs set:
- `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- model endpoint: `gpt-5.4`
- workload: qwen3.5-27b chat 0-8k
- SLO: TTFT p95 <= 4000ms, TPOT p95 <= 25ms, target pass rate 0.95
- search: full range, `inherit_incumbent_floor=false`
The no-harness baseline is running in tmux session `qwen27b-gpu8-noharness-20260513`. The harness run should only be started after the no-harness baseline finishes or reaches a sufficient early comparison point, because both need the full GPU host and should not run concurrently.