Document full config signature validation

This commit is contained in:
2026-06-26 21:52:18 +08:00
parent 48911b658b
commit 42f75553a6
2 changed files with 92 additions and 2 deletions

View File

@@ -83,7 +83,7 @@ kernel、KV cache、通信和排队的闭式性能模型。更稳妥也更强的
| C5. AITuner 找到 near-optimal region而不是只找到一个可行 config | Qwen30B 有解释性信号 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 选 1-2 个 case 做局部 grid 或专家配置对照 | | C5. AITuner 找到 near-optimal region而不是只找到一个可行 config | Qwen30B 有解释性信号 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 选 1-2 个 case 做局部 grid 或专家配置对照 |
| C6. AITuner 能随 SLO tightness 移动到合适 frontier | Qwen30B 已完成 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 再选一个非同质 case 做 SLO sweep同时画 SLO tightness -> frontier/regime transition | | C6. AITuner 能随 SLO tightness 移动到合适 frontier | Qwen30B 已完成 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 再选一个非同质 case 做 SLO sweep同时画 SLO tightness -> frontier/regime transition |
| C7. Engine adapter 让 intervention grammar 可迁移到其他 serving engine | 设计上可行,暂不作为主实验 claim | `EngineLaunchSpec` / launch recipe / tunable schema | vLLM 主线完成后,再做 SGLang adapter 和一个低成本验证 case | | C7. Engine adapter 让 intervention grammar 可迁移到其他 serving engine | 设计上可行,暂不作为主实验 claim | `EngineLaunchSpec` / launch recipe / tunable schema | vLLM 主线完成后,再做 SGLang adapter 和一个低成本验证 case |
| C8. Harness 对坏初始点有恢复能力,不只依赖可信 base config | 当前发现反例,不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution | | C8. Harness 对坏初始点有恢复能力,不只依赖可信 base config | 单个 adversarial bad-start 已通过 first repair分布级 robustness 不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
## 最高优先级实验 ## 最高优先级实验
@@ -103,7 +103,8 @@ declarative intervention grammar + coverage-relative validator。
`search.high=1.0` 仍然不足,必须报告 `measurement_ceiling_insufficient` 并等待人类 `search.high=1.0` 仍然不足,必须报告 `measurement_ceiling_insufficient` 并等待人类
确认,不得静默重复窗口或合成 arrivals 确认,不得静默重复窗口或合成 arrivals
- normalized full-config signatureno-repeat 不能只看 patch signaturebase config 与 - normalized full-config signatureno-repeat 不能只看 patch signaturebase config 与
no-op patch 必须被识别为同一 full config no-op patch 必须被识别为同一 full config`48911b6` 已实现并在 dash1 bad-start
validation 中通过;
- Failure invalidation 有保守 region predicate 和 retry/unblock 条件; - Failure invalidation 有保守 region predicate 和 retry/unblock 条件;
- grammar/policy/capability 都有 version 和 anti-overfitting static checks - grammar/policy/capability 都有 version 和 anti-overfitting static checks
- LLM/BO 只能选择合法 candidate不能绕过 validator。 - LLM/BO 只能选择合法 candidate不能绕过 validator。

View File

@@ -281,3 +281,92 @@ TP4 将 best req/s/GPU 从 1.0000 提高到 1.7375。
P0 measurement/stop-order slice: passed. P0 measurement/stop-order slice: passed.
P0 full coverage-relative harness: not yet passed. P0 full coverage-relative harness: not yet passed.
``` ```
## 2026-06-26 normalized full-config validation
Commit `48911b6` 修复了上一节暴露的新 blockerno-repeat 不再只比较 patch
signature而是比较 normalized effective full config。
实现语义:
```text
effective_config =
normalize(base_envs + env_patch,
base_flags + flag_patch)
no_repeat_signature = stable_json(effective_config)
```
因此下面两个 proposal 在 validator 看来是同一个 full config
```text
baseline patch: {}
noop patch: {"tensor-parallel-size": 8}
```
本地验证:
```text
PYTHONPATH=src python3 -m unittest discover -s tests
Ran 145 tests OK
```
dash1 validation
```text
run label = adversarial-badstart-fullsig-48911b6-20260626T133112Z
git sha = 48911b658bbf052d70d952d1cdf55ad6b50ba7a5
machine = dash1, 8x H20
```
Spec 仍使用同一个 adversarial bad-start
```text
tensor-parallel-size = 8
data-parallel-size = 1
gpu-memory-utilization = 0.5
max-num-seqs = 8
search.auto_high.enabled = true
LLM endpoint disabled
```
结果:
| trial | proposal | best sampling_u | request_rate | req/s/GPU | pass |
| --- | --- | ---: | ---: | ---: | ---: |
| trial-0001 | baseline TP8, DP1, gmu0.5, mns8 | 0.935616858887 | 8.00 | 1.0000 | 1.0000 |
| trial-0002 | `tensor-parallel-size=4` | 0.810867944369 | 6.95 | 1.7375 | 0.9832 |
| trial-0003 | `tensor-parallel-size=4`, `gpu-memory-utilization=0.9` | 0.935616858887 | 8.00 | 2.0000 | 1.0000 |
关键 observation
```text
旧 trial-0003:
{"tensor-parallel-size": 8}
-> 等价于 baseline但仍被执行
新 trial-0003:
{"tensor-parallel-size": 4, "gpu-memory-utilization": 0.9}
-> 在已验证 TP4 topology 上继续测试 KV/cache headroom
```
这证明 normalized full-config signature 已经阻止了 patch-level no-op 重测。
机制解释:
1. baseline TP8 saturate search ceiling 只被记录为 measurement evidence
2. 因为 objective 是 `req/s/GPU`topology/resource-efficiency contrast 仍未覆盖,所以
validator 不允许 stop
3. harness 先测试相邻低 TP topologyTP4 把 `req/s/GPU``1.0` 提高到 `1.7375`
4. no-repeat 用 full config signature block 掉等价 TP8 patch
5. harness 在 settled TP4 topology 上继续测试 runtime headroom`gmu=0.9`
`req/s/GPU` 提高到 `2.0`
当前 verdict 更新为:
```text
P0 measurement/stop-order slice: passed.
P0 normalized full-config no-repeat slice: passed.
P0 single adversarial bad-start recovery: passed for this case.
P0 distribution-level bad-start robustness: not yet proven.
```