Refine harness design flow overview

2026-06-29 20:41:54 +08:00
parent 00ba573631
commit 08429e5da8
1 changed files with 44 additions and 18 deletions
--- a/docs/aituner-harness-design-contract.md
+++ b/docs/aituner-harness-design-contract.md
@@ -15,29 +15,55 @@ measurement-grounded, mechanism-guided, validator-controlled experiments。
 换句话说，planner 可以是 LLM、BO、bandit、deterministic heuristic 或人工选择。
 Harness 负责把观测转换成可审计的机制假设，生成合法候选，并用真实测量验证或否定这些假设。

-## 核心流程图
+## 核心状态机

 ```mermaid
-flowchart TD
-    A[StudySpec<br/>engine schema, tunable knobs, hardware, SLO] --> B[Run trial / probe]
-    W[Workload window<br/>prompt length, output length, arrivals, prefix/cache hints] --> C
-    B --> C[Observation schema<br/>effective config, probe result, SLO violations, launch status]
-    C --> D[Evidence compiler<br/>symptom evidence over serving stages]
-    D --> E[Mechanism hypotheses<br/>prefill, decode, admission, memory, launch]
-    E --> F[Mechanism action families<br/>topology, scheduler, concurrency, cache, frontier transfer]
-    F --> G[CandidateSet<br/>legal patches + hypotheses + expected effects]
-    G --> H[Planner backend<br/>LLM / BO / heuristic ranks candidates]
-    H --> I[Validator + materializer<br/>constraints, no-repeat full config, failure memory, stop authority]
-    I --> B
-    B --> J[Measurement verdict<br/>SLO pass, req/s/GPU, latency quantiles]
-    J --> C
-    G --> K[Stop decision<br/>only when coverage and measurement guards allow]
+flowchart LR
+    S[State<br/>workload, constraints, history] --> E[Evidence<br/>SLO symptoms to mechanism signals]
+    E --> C[CandidateSet<br/>typed interventions]
+    C --> V[Validator<br/>legal, novel, covered?]
+    V -->|run trial| M[Measurement<br/>verdict]
+    M --> S
+    V -->|no justified candidate| X[Stop / report]
 ```

-关键点：LLM 不应该绕过 `CandidateSet` 和 `Validator`。LLM 最多是 candidate ranker 或 copilot，
-不是 legality、coverage 或 stop 的 authority。
+Harness 的核心循环只有五步：

-## 模块语义
+1. **State**：维护 workload、SLO、engine/hardware constraints 和历史 trial measurement。
+2. **Evidence**：把 probe 结果从 raw logs 转成 serving-stage symptom signals。
+3. **CandidateSet**：在 mechanism space 中生成有限个 typed interventions。
+4. **Validator**：检查 legality、full-config novelty、failure memory 和 coverage。
+5. **Measurement**：执行被验证过的 intervention，用真实 SLO verdict 更新状态；若没有
+   justified candidate，则 stop 或报告 measurement/coverage gap。
+
+这个状态机表达的是 harness 的最小设计，不依赖具体 planner。LLM、BO、bandit 或
+deterministic heuristic 都只能在 `CandidateSet` 上排序或选择，不能绕过 `Validator`
+直接构造 config，也不能单方面决定 stop。
+
+## 核心设计不变量
+
+后续所有低层模块都服务于三个不变量：
+
+| 不变量 | 含义 | 为什么重要 |
+| --- | --- | --- |
+| Measurement-grounded | 每个状态转移都由真实 probe/SLO verdict 更新 | 防止 planner 把自然语言猜测当成事实 |
+| Mechanism-typed | 候选不是裸 knob vector，而是 topology/scheduler/admission/cache 等 intervention | 降低搜索维度，并让每个 trial 有可解释假设 |
+| Validator-controlled | candidate 和 stop 必须通过 legality、no-repeat、coverage 和 failure guards | 防止重复实验、非法配置和 premature stop |
+
+## 从 High Level 到 Low Level 的展开
+
+下面各节按实现层次展开：
+
+1. Observation schema 定义 harness 能看到什么；
+2. Evidence compiler 说明 symptom 如何变成机制证据；
+3. Mechanism space 说明候选空间从哪里来；
+4. CandidateSet 说明如何构造 intervention；
+5. Planner interface 说明 LLM/BO/heuristic 的边界；
+6. Validator 说明什么能执行、什么能停止。
+
+每一层都区分两件事：当前 prototype 的具体做法，以及这些做法的假设和限制。
+
+## 详细模块语义

 ### 1. Observation Schema