Document hardened harness feedback
This commit is contained in:
@@ -299,3 +299,36 @@ trial-0005 的 `gpu-memory-utilization=0.92` 仍然打平 baseline,旧 run 随
|
||||
成功当成 climb 成功,而没有要求 request_rate_per_gpu 改善。最新本地实现已经修正为
|
||||
measurement-gated GMU climb;下一轮应使用新提交重新跑,验证 GMU tie 后是否转向
|
||||
admission pressure、topology/DP 或其他 family。
|
||||
|
||||
## Hardened Run Feedback
|
||||
|
||||
使用提交 `6b25d56` 在 dash1 重新启动:
|
||||
|
||||
```text
|
||||
run = .aituner/badstart-prefill-hardened-6b25d56-20260628T180104Z
|
||||
case = badstart-expanded-9accf25-20260626T184911Z-runtime_tp2_dp1_gmu070_mns8
|
||||
session = aituner-prefill-hardened-6b25d56
|
||||
```
|
||||
|
||||
截至 2026-06-29 02:27 UTC+8 左右,同一 run 内的 trial sequence 是:
|
||||
|
||||
| trial | patch | request_rate_per_gpu | observation |
|
||||
| --- | --- | ---: | --- |
|
||||
| 0001 | baseline bad-start | 2.983 | 同 run incumbent,明显高于旧 run baseline,说明跨 run 数字不能直接混用 |
|
||||
| 0002 | `tensor-parallel-size=4` | 1.629 | topology TP4 被 falsify |
|
||||
| 0003 | `enable-chunked-prefill=true, max-num-batched-tokens=8192` | 2.025 | standalone scheduler seed 被 falsify |
|
||||
| 0004 | `gpu-memory-utilization=0.9` | 3.258 | GMU=0.9 是当前 best,达到已知 no-harness 水平 |
|
||||
| 0005 | GMU=0.9 + scheduler seed | 2.025 | GMU 与 scheduler seed 的组合被 falsify |
|
||||
| 0006 | `gpu-memory-utilization=0.92` | 3.258 | 与 GMU=0.9 打平,没有继续提升 |
|
||||
|
||||
candidate-set-0007 没有继续提出 `gpu-memory-utilization=0.94`,而是转向
|
||||
`tensor-parallel-size=4, data-parallel-size=2` topology probe。这验证了
|
||||
measurement-gated GMU climb:GMU=0.92 只是打平时,不再继续向更高 GMU 盲目爬升。
|
||||
|
||||
当前最重要的机制结论:
|
||||
|
||||
- scheduler seed 的 priority 和 no-repeat 都按设计工作;
|
||||
- scheduler seed 在这个 case 不是独立 winner,必须被 measurement falsify;
|
||||
- GMU=0.9 是当前真正有效的机制维度;
|
||||
- GMU 的后续 climb 已经从 launch-gated 修正为 improvement-gated;
|
||||
- 后续应看 topology/DP、MNS 或 allocator/layout family 是否能进一步超过 3.258。
|
||||
|
||||
Reference in New Issue
Block a user