From f18765b23513910671255cd08c0cd34b06d7356d Mon Sep 17 00:00:00 2001
From: Gahow Wang <gahow.wang@gmail.com>
Date: Wed, 13 May 2026 09:04:14 +0800
Subject: [PATCH] Document eight-GPU harness rerun

---
 ...-driven-harness-implementation-20260512.md | 23 +++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/docs/harness-ablation/profile-driven-harness-implementation-20260512.md b/docs/harness-ablation/profile-driven-harness-implementation-20260512.md
index 3dbc416..7a3ad55 100644
--- a/docs/harness-ablation/profile-driven-harness-implementation-20260512.md
+++ b/docs/harness-ablation/profile-driven-harness-implementation-20260512.md
@@ -130,3 +130,26 @@ Fix:
 - parse `engine.base_envs.CUDA_VISIBLE_DEVICES`;
 - compute effective GPU count as `min(hardware.gpu_count, visible_device_count)`;
 - filter topology candidates and adjacent TP frontier candidates by the effective GPU count.
+
+## GPU Visibility Correction
+
+On 2026-05-13 we corrected the intended experiment setup: `CUDA_VISIBLE_DEVICES` should be `0,1,2,3,4,5,6,7`, not the previous `0,1,2,4,5,6,7`.
+
+This invalidates direct comparison between the old `gpu3skip` runs and new 8-GPU runs. The old v2 failure was real under the old visible-device profile, but it was not the intended 8-card H20 setup.
+
+New comparable studies:
+
+| Variant | Study ID | Status |
+| --- | --- | --- |
+| no-harness baseline | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-noharness-minprompt-gpt54-20260513` | running first |
+| harness | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-harness-profileplanner-20260513` | queued to run after baseline |
+
+Both specs set:
+
+- `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+- model endpoint: `gpt-5.4`
+- workload: qwen3.5-27b chat 0-8k
+- SLO: TTFT p95 <= 4000ms, TPOT p95 <= 25ms, target pass rate 0.95
+- search: full range, `inherit_incumbent_floor=false`
+
+The no-harness baseline is running in tmux session `qwen27b-gpu8-noharness-20260513`. The harness run should only be started after the no-harness baseline finishes or reaches a sufficient early comparison point, because both need the full GPU host and should not run concurrently.