From 42f75553a6439f7561a0cecbe8dbc24baa69ab1b Mon Sep 17 00:00:00 2001
From: Gahow Wang <gahow.wang@gmail.com>
Date: Fri, 26 Jun 2026 21:52:18 +0800
Subject: [PATCH] Document full config signature validation

---
 docs/aituner-roadmap.md                       |  5 +-
 .../bad-start-stop-counterexample-20260626.md | 89 +++++++++++++++++++
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/docs/aituner-roadmap.md b/docs/aituner-roadmap.md
index de14030..0dd0f00 100644
--- a/docs/aituner-roadmap.md
+++ b/docs/aituner-roadmap.md
@@ -83,7 +83,7 @@ kernel、KV cache、通信和排队的闭式性能模型。更稳妥也更强的
 | C5. AITuner 找到 near-optimal region，而不是只找到一个可行 config | Qwen30B 有解释性信号 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 选 1-2 个 case 做局部 grid 或专家配置对照 |
 | C6. AITuner 能随 SLO tightness 移动到合适 frontier | Qwen30B 已完成 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 再选一个非同质 case 做 SLO sweep；同时画 SLO tightness -> frontier/regime transition |
 | C7. Engine adapter 让 intervention grammar 可迁移到其他 serving engine | 设计上可行，暂不作为主实验 claim | `EngineLaunchSpec` / launch recipe / tunable schema | vLLM 主线完成后，再做 SGLang adapter 和一个低成本验证 case |
-| C8. Harness 对坏初始点有恢复能力，不只依赖可信 base config | 当前发现反例，不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
+| C8. Harness 对坏初始点有恢复能力，不只依赖可信 base config | 单个 adversarial bad-start 已通过 first repair；分布级 robustness 不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
 
 ## 最高优先级实验
 
@@ -103,7 +103,8 @@ declarative intervention grammar + coverage-relative validator。
   `search.high=1.0` 仍然不足，必须报告 `measurement_ceiling_insufficient` 并等待人类
   确认，不得静默重复窗口或合成 arrivals；
 - normalized full-config signature：no-repeat 不能只看 patch signature；base config 与
-  no-op patch 必须被识别为同一 full config；
+  no-op patch 必须被识别为同一 full config；`48911b6` 已实现并在 dash1 bad-start
+  validation 中通过；
 - Failure invalidation 有保守 region predicate 和 retry/unblock 条件；
 - grammar/policy/capability 都有 version 和 anti-overfitting static checks；
 - LLM/BO 只能选择合法 candidate，不能绕过 validator。
diff --git a/docs/harness-ablation/bad-start-stop-counterexample-20260626.md b/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
index 861efa2..6183fb2 100644
--- a/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
+++ b/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
@@ -281,3 +281,92 @@ TP4 将 best req/s/GPU 从 1.0000 提高到 1.7375。
 P0 measurement/stop-order slice: passed.
 P0 full coverage-relative harness: not yet passed.
 ```
+
+## 2026-06-26 normalized full-config validation
+
+Commit `48911b6` 修复了上一节暴露的新 blocker：no-repeat 不再只比较 patch
+signature，而是比较 normalized effective full config。
+
+实现语义：
+
+```text
+effective_config =
+  normalize(base_envs + env_patch,
+            base_flags + flag_patch)
+
+no_repeat_signature = stable_json(effective_config)
+```
+
+因此下面两个 proposal 在 validator 看来是同一个 full config：
+
+```text
+baseline patch: {}
+noop patch:    {"tensor-parallel-size": 8}
+```
+
+本地验证：
+
+```text
+PYTHONPATH=src python3 -m unittest discover -s tests
+Ran 145 tests OK
+```
+
+dash1 validation：
+
+```text
+run label = adversarial-badstart-fullsig-48911b6-20260626T133112Z
+git sha   = 48911b658bbf052d70d952d1cdf55ad6b50ba7a5
+machine   = dash1, 8x H20
+```
+
+Spec 仍使用同一个 adversarial bad-start：
+
+```text
+tensor-parallel-size = 8
+data-parallel-size = 1
+gpu-memory-utilization = 0.5
+max-num-seqs = 8
+search.auto_high.enabled = true
+LLM endpoint disabled
+```
+
+结果：
+
+| trial | proposal | best sampling_u | request_rate | req/s/GPU | pass |
+| --- | --- | ---: | ---: | ---: | ---: |
+| trial-0001 | baseline TP8, DP1, gmu0.5, mns8 | 0.935616858887 | 8.00 | 1.0000 | 1.0000 |
+| trial-0002 | `tensor-parallel-size=4` | 0.810867944369 | 6.95 | 1.7375 | 0.9832 |
+| trial-0003 | `tensor-parallel-size=4`, `gpu-memory-utilization=0.9` | 0.935616858887 | 8.00 | 2.0000 | 1.0000 |
+
+关键 observation：
+
+```text
+旧 trial-0003:
+  {"tensor-parallel-size": 8}
+  -> 等价于 baseline，但仍被执行
+
+新 trial-0003:
+  {"tensor-parallel-size": 4, "gpu-memory-utilization": 0.9}
+  -> 在已验证 TP4 topology 上继续测试 KV/cache headroom
+```
+
+这证明 normalized full-config signature 已经阻止了 patch-level no-op 重测。
+
+机制解释：
+
+1. baseline TP8 saturate search ceiling 只被记录为 measurement evidence；
+2. 因为 objective 是 `req/s/GPU`，topology/resource-efficiency contrast 仍未覆盖，所以
+   validator 不允许 stop；
+3. harness 先测试相邻低 TP topology，TP4 把 `req/s/GPU` 从 `1.0` 提高到 `1.7375`；
+4. no-repeat 用 full config signature block 掉等价 TP8 patch；
+5. harness 在 settled TP4 topology 上继续测试 runtime headroom，`gmu=0.9` 把
+   `req/s/GPU` 提高到 `2.0`。
+
+当前 verdict 更新为：
+
+```text
+P0 measurement/stop-order slice: passed.
+P0 normalized full-config no-repeat slice: passed.
+P0 single adversarial bad-start recovery: passed for this case.
+P0 distribution-level bad-start robustness: not yet proven.
+```