Document full config signature validation

2026-06-26 21:52:18 +08:00
parent 48911b658b
commit 42f75553a6
2 changed files with 92 additions and 2 deletions
--- a/docs/aituner-roadmap.md
+++ b/docs/aituner-roadmap.md
@@ -83,7 +83,7 @@ kernel、KV cache、通信和排队的闭式性能模型。更稳妥也更强的
 | C5. AITuner 找到 near-optimal region，而不是只找到一个可行 config | Qwen30B 有解释性信号 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 选 1-2 个 case 做局部 grid 或专家配置对照 |
 | C6. AITuner 能随 SLO tightness 移动到合适 frontier | Qwen30B 已完成 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 再选一个非同质 case 做 SLO sweep；同时画 SLO tightness -> frontier/regime transition |
 | C7. Engine adapter 让 intervention grammar 可迁移到其他 serving engine | 设计上可行，暂不作为主实验 claim | `EngineLaunchSpec` / launch recipe / tunable schema | vLLM 主线完成后，再做 SGLang adapter 和一个低成本验证 case |
-| C8. Harness 对坏初始点有恢复能力，不只依赖可信 base config | 当前发现反例，不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
+| C8. Harness 对坏初始点有恢复能力，不只依赖可信 base config | 单个 adversarial bad-start 已通过 first repair；分布级 robustness 不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
 ## 最高优先级实验
@@ -103,7 +103,8 @@ declarative intervention grammar + coverage-relative validator。
  `search.high=1.0` 仍然不足，必须报告 `measurement_ceiling_insufficient` 并等待人类
  确认，不得静默重复窗口或合成 arrivals；
 - normalized full-config signature：no-repeat 不能只看 patch signature；base config 与
-  no-op patch 必须被识别为同一 full config；
+  no-op patch 必须被识别为同一 full config；`48911b6` 已实现并在 dash1 bad-start
  validation 中通过；
 - Failure invalidation 有保守 region predicate 和 retry/unblock 条件；
 - grammar/policy/capability 都有 version 和 anti-overfitting static checks；
 - LLM/BO 只能选择合法 candidate，不能绕过 validator。
--- a/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
+++ b/docs/harness-ablation/bad-start-stop-counterexample-20260626.md
@@ -281,3 +281,92 @@ TP4 将 best req/s/GPU 从 1.0000 提高到 1.7375。
 P0 measurement/stop-order slice: passed.
 P0 full coverage-relative harness: not yet passed.
 ```
 ## 2026-06-26 normalized full-config validation
 Commit `48911b6` 修复了上一节暴露的新 blocker：no-repeat 不再只比较 patch
 signature，而是比较 normalized effective full config。
 实现语义：
 ```text
 effective_config =
  normalize(base_envs + env_patch,
            base_flags + flag_patch)
 no_repeat_signature = stable_json(effective_config)
 ```
 因此下面两个 proposal 在 validator 看来是同一个 full config：
 ```text
 baseline patch: {}
 noop patch:    {"tensor-parallel-size": 8}
 ```
 本地验证：
 ```text
 PYTHONPATH=src python3 -m unittest discover -s tests
 Ran 145 tests OK
 ```
 dash1 validation：
 ```text
 run label = adversarial-badstart-fullsig-48911b6-20260626T133112Z
 git sha   = 48911b658bbf052d70d952d1cdf55ad6b50ba7a5
 machine   = dash1, 8x H20
 ```
 Spec 仍使用同一个 adversarial bad-start：
 ```text
 tensor-parallel-size = 8
 data-parallel-size = 1
 gpu-memory-utilization = 0.5
 max-num-seqs = 8
 search.auto_high.enabled = true
 LLM endpoint disabled
 ```
 结果：
 | trial | proposal | best sampling_u | request_rate | req/s/GPU | pass |
 | --- | --- | ---: | ---: | ---: | ---: |
 | trial-0001 | baseline TP8, DP1, gmu0.5, mns8 | 0.935616858887 | 8.00 | 1.0000 | 1.0000 |
 | trial-0002 | `tensor-parallel-size=4` | 0.810867944369 | 6.95 | 1.7375 | 0.9832 |
 | trial-0003 | `tensor-parallel-size=4`, `gpu-memory-utilization=0.9` | 0.935616858887 | 8.00 | 2.0000 | 1.0000 |
 关键 observation：
 ```text
 旧 trial-0003:
  {"tensor-parallel-size": 8}
  -> 等价于 baseline，但仍被执行
 新 trial-0003:
  {"tensor-parallel-size": 4, "gpu-memory-utilization": 0.9}
  -> 在已验证 TP4 topology 上继续测试 KV/cache headroom
 ```
 这证明 normalized full-config signature 已经阻止了 patch-level no-op 重测。
 机制解释：
 1. baseline TP8 saturate search ceiling 只被记录为 measurement evidence；
 2. 因为 objective 是 `req/s/GPU`，topology/resource-efficiency contrast 仍未覆盖，所以
   validator 不允许 stop；
 3. harness 先测试相邻低 TP topology，TP4 把 `req/s/GPU` 从 `1.0` 提高到 `1.7375`；
 4. no-repeat 用 full config signature block 掉等价 TP8 patch；
 5. harness 在 settled TP4 topology 上继续测试 runtime headroom，`gmu=0.9` 把
   `req/s/GPU` 提高到 `2.0`。
 当前 verdict 更新为：
 ```text
 P0 measurement/stop-order slice: passed.
 P0 normalized full-config no-repeat slice: passed.
 P0 single adversarial bad-start recovery: passed for this case.
 P0 distribution-level bad-start robustness: not yet proven.
 ```