From f18765b23513910671255cd08c0cd34b06d7356d Mon Sep 17 00:00:00 2001 From: Gahow Wang Date: Wed, 13 May 2026 09:04:14 +0800 Subject: [PATCH] Document eight-GPU harness rerun --- ...-driven-harness-implementation-20260512.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/harness-ablation/profile-driven-harness-implementation-20260512.md b/docs/harness-ablation/profile-driven-harness-implementation-20260512.md index 3dbc416..7a3ad55 100644 --- a/docs/harness-ablation/profile-driven-harness-implementation-20260512.md +++ b/docs/harness-ablation/profile-driven-harness-implementation-20260512.md @@ -130,3 +130,26 @@ Fix: - parse `engine.base_envs.CUDA_VISIBLE_DEVICES`; - compute effective GPU count as `min(hardware.gpu_count, visible_device_count)`; - filter topology candidates and adjacent TP frontier candidates by the effective GPU count. + +## GPU Visibility Correction + +On 2026-05-13 we corrected the intended experiment setup: `CUDA_VISIBLE_DEVICES` should be `0,1,2,3,4,5,6,7`, not the previous `0,1,2,4,5,6,7`. + +This invalidates direct comparison between the old `gpu3skip` runs and new 8-GPU runs. The old v2 failure was real under the old visible-device profile, but it was not the intended 8-card H20 setup. + +New comparable studies: + +| Variant | Study ID | Status | +| --- | --- | --- | +| no-harness baseline | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-noharness-minprompt-gpt54-20260513` | running first | +| harness | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-harness-profileplanner-20260513` | queued to run after baseline | + +Both specs set: + +- `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7` +- model endpoint: `gpt-5.4` +- workload: qwen3.5-27b chat 0-8k +- SLO: TTFT p95 <= 4000ms, TPOT p95 <= 25ms, target pass rate 0.95 +- search: full range, `inherit_incumbent_floor=false` + +The no-harness baseline is running in tmux session `qwen27b-gpu8-noharness-20260513`. The harness run should only be started after the no-harness baseline finishes or reaches a sufficient early comparison point, because both need the full GPU host and should not run concurrently.