Refine harness design flow overview
This commit is contained in:
@@ -15,29 +15,55 @@ measurement-grounded, mechanism-guided, validator-controlled experiments。
|
||||
换句话说,planner 可以是 LLM、BO、bandit、deterministic heuristic 或人工选择。
|
||||
Harness 负责把观测转换成可审计的机制假设,生成合法候选,并用真实测量验证或否定这些假设。
|
||||
|
||||
## 核心流程图
|
||||
## 核心状态机
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[StudySpec<br/>engine schema, tunable knobs, hardware, SLO] --> B[Run trial / probe]
|
||||
W[Workload window<br/>prompt length, output length, arrivals, prefix/cache hints] --> C
|
||||
B --> C[Observation schema<br/>effective config, probe result, SLO violations, launch status]
|
||||
C --> D[Evidence compiler<br/>symptom evidence over serving stages]
|
||||
D --> E[Mechanism hypotheses<br/>prefill, decode, admission, memory, launch]
|
||||
E --> F[Mechanism action families<br/>topology, scheduler, concurrency, cache, frontier transfer]
|
||||
F --> G[CandidateSet<br/>legal patches + hypotheses + expected effects]
|
||||
G --> H[Planner backend<br/>LLM / BO / heuristic ranks candidates]
|
||||
H --> I[Validator + materializer<br/>constraints, no-repeat full config, failure memory, stop authority]
|
||||
I --> B
|
||||
B --> J[Measurement verdict<br/>SLO pass, req/s/GPU, latency quantiles]
|
||||
J --> C
|
||||
G --> K[Stop decision<br/>only when coverage and measurement guards allow]
|
||||
flowchart LR
|
||||
S[State<br/>workload, constraints, history] --> E[Evidence<br/>SLO symptoms to mechanism signals]
|
||||
E --> C[CandidateSet<br/>typed interventions]
|
||||
C --> V[Validator<br/>legal, novel, covered?]
|
||||
V -->|run trial| M[Measurement<br/>verdict]
|
||||
M --> S
|
||||
V -->|no justified candidate| X[Stop / report]
|
||||
```
|
||||
|
||||
关键点:LLM 不应该绕过 `CandidateSet` 和 `Validator`。LLM 最多是 candidate ranker 或 copilot,
|
||||
不是 legality、coverage 或 stop 的 authority。
|
||||
Harness 的核心循环只有五步:
|
||||
|
||||
## 模块语义
|
||||
1. **State**:维护 workload、SLO、engine/hardware constraints 和历史 trial measurement。
|
||||
2. **Evidence**:把 probe 结果从 raw logs 转成 serving-stage symptom signals。
|
||||
3. **CandidateSet**:在 mechanism space 中生成有限个 typed interventions。
|
||||
4. **Validator**:检查 legality、full-config novelty、failure memory 和 coverage。
|
||||
5. **Measurement**:执行被验证过的 intervention,用真实 SLO verdict 更新状态;若没有
|
||||
justified candidate,则 stop 或报告 measurement/coverage gap。
|
||||
|
||||
这个状态机表达的是 harness 的最小设计,不依赖具体 planner。LLM、BO、bandit 或
|
||||
deterministic heuristic 都只能在 `CandidateSet` 上排序或选择,不能绕过 `Validator`
|
||||
直接构造 config,也不能单方面决定 stop。
|
||||
|
||||
## 核心设计不变量
|
||||
|
||||
后续所有低层模块都服务于三个不变量:
|
||||
|
||||
| 不变量 | 含义 | 为什么重要 |
|
||||
| --- | --- | --- |
|
||||
| Measurement-grounded | 每个状态转移都由真实 probe/SLO verdict 更新 | 防止 planner 把自然语言猜测当成事实 |
|
||||
| Mechanism-typed | 候选不是裸 knob vector,而是 topology/scheduler/admission/cache 等 intervention | 降低搜索维度,并让每个 trial 有可解释假设 |
|
||||
| Validator-controlled | candidate 和 stop 必须通过 legality、no-repeat、coverage 和 failure guards | 防止重复实验、非法配置和 premature stop |
|
||||
|
||||
## 从 High Level 到 Low Level 的展开
|
||||
|
||||
下面各节按实现层次展开:
|
||||
|
||||
1. Observation schema 定义 harness 能看到什么;
|
||||
2. Evidence compiler 说明 symptom 如何变成机制证据;
|
||||
3. Mechanism space 说明候选空间从哪里来;
|
||||
4. CandidateSet 说明如何构造 intervention;
|
||||
5. Planner interface 说明 LLM/BO/heuristic 的边界;
|
||||
6. Validator 说明什么能执行、什么能停止。
|
||||
|
||||
每一层都区分两件事:当前 prototype 的具体做法,以及这些做法的假设和限制。
|
||||
|
||||
## 详细模块语义
|
||||
|
||||
### 1. Observation Schema
|
||||
|
||||
|
||||
Reference in New Issue
Block a user