Document bad-start validation results

2026-06-26 20:50:20 +08:00
parent c8a0f9870e
commit 7f50b8b8ea
2 changed files with 89 additions and 0 deletions
--- a/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
+++ b/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
@@ -194,3 +194,90 @@ promising substrate, but not production-proven
 ```text
 先实现 coverage-relative stop authority，再重跑 bad-start distribution。
 ```
+
+## 2026-06-26 implementation validation
+
+Commit `c8a0f98` 实现了第一片 production 修复：
+
+- `search.auto_high` schema，默认关闭，旧配置兼容；
+- trial materialization 时在已有 trace sampling space 内 resolve effective `search.high`；
+- `trial_spec.json` 和 `result.json` 写入 auto-high / measurement evidence；
+- `search_high_saturated_by_incumbent` 降级为 measurement evidence；
+- 对 `req/s/GPU` 且 topology 可变的 study，high saturation 不能直接授权 stop；
+- 固定 GPU product 但 TP/DP redistribution 可调时，仍视为 topology 可变；
+- auto-high ceiling 低于 `search.low` 时不生成非法 search interval。
+
+本地验证：
+
+```text
+PYTHONPATH=src python3 -m unittest discover -s tests
+Ran 143 tests OK
+```
+
+dash1 validation：
+
+```text
+run label = adversarial-badstart-autohigh-c8a0f98-20260626T122622Z
+git sha   = c8a0f9870eac5438fb19be8edf1534a893723ab9
+machine   = dash1, 8x H20
+```
+
+Spec 仍使用 bad-start：
+
+```text
+tensor-parallel-size = 8
+data-parallel-size = 1
+gpu-memory-utilization = 0.5
+max-num-seqs = 8
+search.auto_high.enabled = true
+```
+
+Auto-high resolution：
+
+```text
+original_high       = 1.0
+effective_high      = 0.9979913161468553
+trace_max_sampling_u = 0.9979913161468553
+reason              = search_high_lowered_to_trace_ceiling
+```
+
+结果：
+
+| trial | config patch | best sampling_u | request_rate | req/s/GPU | pass |
+| --- | --- | ---: | ---: | ---: | ---: |
+| trial-0001 | baseline TP8, DP1, gmu0.5, mns8 | 0.935616858887 | 8.00 | 1.0000 | 1.0000 |
+| trial-0002 | `tensor-parallel-size=4` | 0.810867944369 | 6.95 | 1.7375 | 0.9784 |
+| trial-0003 | `tensor-parallel-size=8` | 0.935616858887 | 8.00 | 1.0000 | 1.0000 |
+
+关键结论：
+
+```text
+旧 failure 已被修复：
+baseline 后不再产生 harness-stop-0002/search_high_saturated_by_incumbent。
+
+新实现产生 harness-proposal-0002，并测试 TP4 topology contrast。
+TP4 将 best req/s/GPU 从 1.0000 提高到 1.7375。
+```
+
+这证明第一片修复解决了“measurement saturation 绕过 topology coverage”的问题。
+
+但是 trial-0003 暴露了新 blocker：
+
+```text
+当前 no-repeat 仍基于 patch signature，而不是 normalized full-config signature。
+```
+
+`tensor-parallel-size=8` 对这个 study 的 base config 是 no-op，等价于 baseline TP8，
+但系统仍把它当成一个新 proposal 执行。这说明下一片 P0 必须实现：
+
+1. normalized full-config signature；
+2. CandidateSet snapshot，包含 eligible 和 blocked candidates；
+3. blocked reason，例如 `blocked_noop_equivalent_to_tested_full_config`；
+4. Stop/report 中同时呈现 `measurement_ceiling_*` 和 `eligible_candidates_remain`。
+
+因此当前 verdict 更新为：
+
+```text
+P0 measurement/stop-order slice: passed.
+P0 full coverage-relative harness: not yet passed.
+```