Update MEETING.md + PAPER_OUTLINE.md with connector_tax substrate validation
2026-05-27 trace-replay A/B/C (commit ef9e010) shows the kv_both substrate
is net positive on current codebase, not just neutral:
- TTFT p90: 11.97s plain → 9.74s kv_both (−18.6%) → 7.58s with DR-fix (−36.6%)
This reverses the elastic_migration_v2 paper's +45% kv_both penalty claim
and removes the primary cause of the 4 prior migration reverts.
Reframes EAR Pillar 2 from "DEFERRED" to "PARTIAL" — substrate verified,
e2e strategy-layer validation (trigger thresholds + target selection in
the dispatch-coupling feedback loop) remains as the only open risk.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -13,14 +13,15 @@
|
||||
| §3.2 静态 PD-disagg 撞 KV 墙 | ✅ 完整 (`f4b`) | — |
|
||||
| §3.3 Sticky 造 hot pin | ✅ 完整 (`f4c`, `f4d`) | — |
|
||||
| §4.1-2 Affinity routing | ✅ 已实现(current `unified` 算法)| — |
|
||||
| §4.3 Migration mechanism | 🚧 **DEFERRED** | 待 connector_tax fix 后重测 |
|
||||
| `kv_both` substrate cost | ✅ **VERIFIED net-positive** (2026-05-27, commit `ef9e010`) | TTFT p90 −18.6% w/o DR-fix, −36.6% w/ DR-fix |
|
||||
| §4.3 Migration mechanism (e2e) | 🚧 **PARTIAL** | substrate 已通;e2e trigger + target selection 实验未跑 |
|
||||
| §5.2 End-to-end | ⚠️ 5/6 baseline 有数据 (`f6`) | 缺 static PD-disagg;EAR 列待 migration |
|
||||
| §5.3 Ablation | 🚧 **PARTIAL DEFER** | 仅 affinity-only 现可做,full 待 migration |
|
||||
| §5.4 Dispatch coupling validation | 🚧 **NEW DATA NEEDED** | 5 baseline wall-clock 重跑(Phase 1 patch 后)|
|
||||
| §5.5 Sensitivity | 🚧 **PARTIAL DEFER** | λ/skew/KV pool 可做;`T_hot`/`T_cool` 待 migration |
|
||||
| §5.6 Migration microbench | 🚧 **FULL DEFER** | 完全依赖 migration validation |
|
||||
|
||||
**前提背景**:team 之前 4 次尝试 migration 都因 transfer overhead 被还原(见 `analysis/unified_routing_fix_review.md`);最近 `connector_tax` 工作的 DR-fix 把 build_connector_meta 的 1.4ms/step overhead 降到接近 0,但还未跑过完整 migration 实验。**EAR 的 migration 部分目前是 design intent,待重测后写入实证。**
|
||||
**前提背景**:team 之前 4 次尝试 migration 都因 transfer overhead 被还原(见 `analysis/unified_routing_fix_review.md`);2026-05-27 的 trace-replay A/B/C(`microbench/connector_tax/cache_sweep/REPORT_TRACE_REPLAY.md`)证明 `kv_both` substrate 已经反转 —— 不仅 +45% penalty obsolete,substrate 本身就是 net positive(TTFT p90 −18.6% vs plain,DR-fix 后 −36.6%)。**之前 4 次 migration revert 的最大根因消失,但 e2e migration 策略层(trigger + target selection 在反馈环里的真实收益)仍未直接验证 —— EAR 的 migration 部分实验已无 substrate 风险,只剩策略层风险。**
|
||||
|
||||
---
|
||||
|
||||
@@ -165,11 +166,13 @@ EAR 是位于 N 个同质 instance 之上的 router。每个 instance 是对称
|
||||
- **Warm path**:已建立 session 的后续每个 turn 一律路由到当前 host
|
||||
- **效果**:intra-session KV reuse 被构造性保留,APC 接近 §2.2 的上界 79.6%
|
||||
|
||||
### §4.3 Pillar 2: Hot-Triggered Session Migration 🚧 DEFERRED VALIDATION
|
||||
### §4.3 Pillar 2: Hot-Triggered Session Migration 🚧 PARTIAL VALIDATION
|
||||
|
||||
避免 Pillar 1 退化成 pure sticky 的关键 mechanism。
|
||||
|
||||
> **状态**:Design 描述完整,但实证数据待 `connector_tax` DR-fix 之后重测。之前 4 次 migration 尝试(`6b255fa`, `e991960/5772149`, `cc6e562`, `4c583f2`)都因 transfer overhead 被还原 —— 直到 DR-fix 之前,migration 的实测收益始终被 overhead 吞掉。新一轮验证未跑。
|
||||
> **状态(2026-05-27 更新)**:
|
||||
> - **Substrate 验证 PASS**(commit `ef9e010`):`kv_both` connector 在 trace replay 上 net positive(TTFT p90 −18.6%),DR-fix 后再 −22%。之前认为是 migration blocker 的 transfer overhead 已不存在。
|
||||
> - **策略层 e2e 验证 PENDING**:trigger 阈值 + target selection 在 agentic 反馈环里的真实收益仍未直接测。之前 4 次 migration 尝试(`6b255fa`, `e991960/5772149`, `cc6e562`, `4c583f2`)被还原的主因(substrate overhead)已消失,但 trigger 决策错误 + cooldown thrashing 是独立风险,需新一轮 e2e 实验确认。
|
||||
|
||||
#### §4.3.1 Trigger signal
|
||||
|
||||
@@ -340,7 +343,7 @@ KV transfer 发生在触发该 migration 的 request 的 critical path 上,但
|
||||
|
||||
### 🚧 Deferred (待 migration validation)
|
||||
|
||||
- [ ] §4.3 migration mechanism 重测(`connector_tax` DR-fix 之后跑)
|
||||
- [ ] §4.3 migration mechanism e2e 验证:substrate 已通(commit `ef9e010`),缺 trigger + target selection 的策略层实验
|
||||
- [ ] §5.3 full ablation (migration-only + both 两个配置)
|
||||
- [ ] §5.5 `T_hot` / `T_cool` 两轴 sensitivity
|
||||
- [ ] §5.6 migration microbench 全部
|
||||
|
||||
Reference in New Issue
Block a user