Document full config signature validation
This commit is contained in:
@@ -83,7 +83,7 @@ kernel、KV cache、通信和排队的闭式性能模型。更稳妥也更强的
|
||||
| C5. AITuner 找到 near-optimal region,而不是只找到一个可行 config | Qwen30B 有解释性信号 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 选 1-2 个 case 做局部 grid 或专家配置对照 |
|
||||
| C6. AITuner 能随 SLO tightness 移动到合适 frontier | Qwen30B 已完成 | [Qwen30B SLO robustness](harness-ablation/qwen30b-slo-robustness-20260624.md) | 再选一个非同质 case 做 SLO sweep;同时画 SLO tightness -> frontier/regime transition |
|
||||
| C7. Engine adapter 让 intervention grammar 可迁移到其他 serving engine | 设计上可行,暂不作为主实验 claim | `EngineLaunchSpec` / launch recipe / tunable schema | vLLM 主线完成后,再做 SGLang adapter 和一个低成本验证 case |
|
||||
| C8. Harness 对坏初始点有恢复能力,不只依赖可信 base config | 当前发现反例,不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
|
||||
| C8. Harness 对坏初始点有恢复能力,不只依赖可信 base config | 单个 adversarial bad-start 已通过 first repair;分布级 robustness 不能 claim | [Declarative intervention harness design](harness-ablation/declarative-intervention-harness-design-20260626.md), [Bad-start stop counterexample](harness-ablation/bad-start-stop-counterexample-20260626.md), [No-LLM harness mechanism](harness-ablation/no-llm-harness-mechanism-20260625.md) | 重构为 grammar/operator + coverage-relative stop 后跑 random/adversarial start distribution |
|
||||
|
||||
## 最高优先级实验
|
||||
|
||||
@@ -103,7 +103,8 @@ declarative intervention grammar + coverage-relative validator。
|
||||
`search.high=1.0` 仍然不足,必须报告 `measurement_ceiling_insufficient` 并等待人类
|
||||
确认,不得静默重复窗口或合成 arrivals;
|
||||
- normalized full-config signature:no-repeat 不能只看 patch signature;base config 与
|
||||
no-op patch 必须被识别为同一 full config;
|
||||
no-op patch 必须被识别为同一 full config;`48911b6` 已实现并在 dash1 bad-start
|
||||
validation 中通过;
|
||||
- Failure invalidation 有保守 region predicate 和 retry/unblock 条件;
|
||||
- grammar/policy/capability 都有 version 和 anti-overfitting static checks;
|
||||
- LLM/BO 只能选择合法 candidate,不能绕过 validator。
|
||||
|
||||
@@ -281,3 +281,92 @@ TP4 将 best req/s/GPU 从 1.0000 提高到 1.7375。
|
||||
P0 measurement/stop-order slice: passed.
|
||||
P0 full coverage-relative harness: not yet passed.
|
||||
```
|
||||
|
||||
## 2026-06-26 normalized full-config validation
|
||||
|
||||
Commit `48911b6` 修复了上一节暴露的新 blocker:no-repeat 不再只比较 patch
|
||||
signature,而是比较 normalized effective full config。
|
||||
|
||||
实现语义:
|
||||
|
||||
```text
|
||||
effective_config =
|
||||
normalize(base_envs + env_patch,
|
||||
base_flags + flag_patch)
|
||||
|
||||
no_repeat_signature = stable_json(effective_config)
|
||||
```
|
||||
|
||||
因此下面两个 proposal 在 validator 看来是同一个 full config:
|
||||
|
||||
```text
|
||||
baseline patch: {}
|
||||
noop patch: {"tensor-parallel-size": 8}
|
||||
```
|
||||
|
||||
本地验证:
|
||||
|
||||
```text
|
||||
PYTHONPATH=src python3 -m unittest discover -s tests
|
||||
Ran 145 tests OK
|
||||
```
|
||||
|
||||
dash1 validation:
|
||||
|
||||
```text
|
||||
run label = adversarial-badstart-fullsig-48911b6-20260626T133112Z
|
||||
git sha = 48911b658bbf052d70d952d1cdf55ad6b50ba7a5
|
||||
machine = dash1, 8x H20
|
||||
```
|
||||
|
||||
Spec 仍使用同一个 adversarial bad-start:
|
||||
|
||||
```text
|
||||
tensor-parallel-size = 8
|
||||
data-parallel-size = 1
|
||||
gpu-memory-utilization = 0.5
|
||||
max-num-seqs = 8
|
||||
search.auto_high.enabled = true
|
||||
LLM endpoint disabled
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||
| trial | proposal | best sampling_u | request_rate | req/s/GPU | pass |
|
||||
| --- | --- | ---: | ---: | ---: | ---: |
|
||||
| trial-0001 | baseline TP8, DP1, gmu0.5, mns8 | 0.935616858887 | 8.00 | 1.0000 | 1.0000 |
|
||||
| trial-0002 | `tensor-parallel-size=4` | 0.810867944369 | 6.95 | 1.7375 | 0.9832 |
|
||||
| trial-0003 | `tensor-parallel-size=4`, `gpu-memory-utilization=0.9` | 0.935616858887 | 8.00 | 2.0000 | 1.0000 |
|
||||
|
||||
关键 observation:
|
||||
|
||||
```text
|
||||
旧 trial-0003:
|
||||
{"tensor-parallel-size": 8}
|
||||
-> 等价于 baseline,但仍被执行
|
||||
|
||||
新 trial-0003:
|
||||
{"tensor-parallel-size": 4, "gpu-memory-utilization": 0.9}
|
||||
-> 在已验证 TP4 topology 上继续测试 KV/cache headroom
|
||||
```
|
||||
|
||||
这证明 normalized full-config signature 已经阻止了 patch-level no-op 重测。
|
||||
|
||||
机制解释:
|
||||
|
||||
1. baseline TP8 saturate search ceiling 只被记录为 measurement evidence;
|
||||
2. 因为 objective 是 `req/s/GPU`,topology/resource-efficiency contrast 仍未覆盖,所以
|
||||
validator 不允许 stop;
|
||||
3. harness 先测试相邻低 TP topology,TP4 把 `req/s/GPU` 从 `1.0` 提高到 `1.7375`;
|
||||
4. no-repeat 用 full config signature block 掉等价 TP8 patch;
|
||||
5. harness 在 settled TP4 topology 上继续测试 runtime headroom,`gmu=0.9` 把
|
||||
`req/s/GPU` 提高到 `2.0`。
|
||||
|
||||
当前 verdict 更新为:
|
||||
|
||||
```text
|
||||
P0 measurement/stop-order slice: passed.
|
||||
P0 normalized full-config no-repeat slice: passed.
|
||||
P0 single adversarial bad-start recovery: passed for this case.
|
||||
P0 distribution-level bad-start robustness: not yet proven.
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user