Document eight-GPU harness rerun

2026-05-13 09:04:14 +08:00
parent 5c2958e6c1
commit f18765b235
1 changed files with 23 additions and 0 deletions
--- a/docs/harness-ablation/profile-driven-harness-implementation-20260512.md
+++ b/docs/harness-ablation/profile-driven-harness-implementation-20260512.md
@@ -130,3 +130,26 @@ Fix:
 - parse `engine.base_envs.CUDA_VISIBLE_DEVICES`;
 - compute effective GPU count as `min(hardware.gpu_count, visible_device_count)`;
 - filter topology candidates and adjacent TP frontier candidates by the effective GPU count.
 ## GPU Visibility Correction
 On 2026-05-13 we corrected the intended experiment setup: `CUDA_VISIBLE_DEVICES` should be `0,1,2,3,4,5,6,7`, not the previous `0,1,2,4,5,6,7`.
 This invalidates direct comparison between the old `gpu3skip` runs and new 8-GPU runs. The old v2 failure was real under the old visible-device profile, but it was not the intended 8-card H20 setup.
 New comparable studies:
 | Variant | Study ID | Status |
 | --- | --- | --- |
 | no-harness baseline | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-noharness-minprompt-gpt54-20260513` | running first |
 | harness | `dash0-qwen27b-chat-0-8k-ttft4s-tpot25-gpu8-12iter-harness-profileplanner-20260513` | queued to run after baseline |
 Both specs set:
 - `CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
 - model endpoint: `gpt-5.4`
 - workload: qwen3.5-27b chat 0-8k
 - SLO: TTFT p95 <= 4000ms, TPOT p95 <= 25ms, target pass rate 0.95
 - search: full range, `inherit_incumbent_floor=false`
 The no-harness baseline is running in tmux session `qwen27b-gpu8-noharness-20260513`. The harness run should only be started after the no-harness baseline finishes or reaches a sufficient early comparison point, because both need the full GPU host and should not run concurrently.