§2.3 reframe: dispatch coupling is regime-dependent, not binary chatbot/agentic

The previous §2.3 narrative said "chatbot has T_human ≈ 30 s think-time, agentic has T_external ≈ 0, so agentic is always closed-loop and chatbot never is". The new T_external measurements on the production chatbot trace (qwen3-max, n=42 k inter-turn gaps from formatted parent_chat_id sessions) show the binary framing is wrong: agentic p50 1.6 s, 39% gaps < 1 s, p99 738 s chatbot p50 7.2 s, 4% gaps < 1 s, p99 43 s Both have nonzero T_external. The right distinction is the *shape*: chatbot is unimodal around 5–15 s (human cadence); agentic is bimodal with a sub-second tool-call mass (39 % vs chatbot's 4 %) plus a long- pause tail (13 % > 30 s). The agentic sub-second mass is what activates dispatch coupling — for any W_turn > 1 s scheduler those turns satisfy W_turn ≫ T_external by construction. The empirical regime split: unified TTFT p90 = 7.3 s → agentic 73% closed-loop, chatbot 32% lmetric TTFT p90 = 15.7s → agentic 80%, chatbot 88% lmetric is bad enough that it drags the chatbot regime into closed-loop too. This is a direct empirical explanation for lmetric underperforming on both workloads. Updates: - PAPER_OUTLINE.md §2.3: lead with the regime threshold W_turn ≷ T_external, replace the "T_human dominates" Little's Law with the general form L = Λ · N · (W_turn(L) + T_external), embed f3a CDF, add the empirical regime table; correct the small-perturbation formula to include the +T_external dampening term. - MEETING.md §1: same reframe, condensed (CDF figure, two-row regime table, one-line conclusion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:51:38 +08:00
parent 876d09db83
commit b11dc30945
2 changed files with 60 additions and 26 deletions
--- a/MEETING.md
+++ b/MEETING.md
@@ -6,18 +6,26 @@

 ## 1. 关键洞察：Dispatch Coupling

-Chatbot：turn 间有人类 think-time，系统快慢 ⊥ 下一 turn 到达率。
-Agentic：turn 间只有 tool-call 返回 (≈0)，**系统跑慢 → session 停留长 → 并发多 → KV pool 紧 → 更慢**。
+每个 turn 间有一段外部 gap `T_external`（chatbot 是人类读+想+打字；agentic 是 tool 执行）。**Little's Law `L = Λ · N · (W_turn + T_external)`** 在两种 workload 下都成立 —— 差异在于 `T_external` 的分布相对于 `W_turn` 的位置：
+- `T_external ≫ W_turn` → 开环 regime：scheduler 退一步不动 L
+- `T_external ≲ W_turn` → 闭环 regime：`W_turn(L)` 因 KV 竞争耦合到 L，反馈环把 scheduler 的 ε 退步放大几倍

-Little's Law 隐式方程：
+**Production trace 实测 `T_external` 分布**（next.start − prev.end，formatted session 链作 ground truth）：

-```
-L = Λ · N · W_turn(L)        # agentic, T_human≈0
-```
+![](figs/f3a_inter_turn_gap.png)

-小扰动分析：amplification = `1 / (1 − Λ·N·W'(L*))`，系统接近 KV 饱和时发散。
+| | Agentic | Chatbot |
+|---|---:|---:|
+| p50 | **1.6s** | **7.2s** |
+| gap < 1s | **39%** | 4% |
+| gap < 5s | 67% | 29% |
+| p99 | 738s | 43s |

-**实测**：lmetric 跑 600s trace 用 49 min wall-clock = **8x amplification**。同硬件 unified 比 lmetric session 清空速度 ~3x。**per-turn W 的小差异被放大成 wall-clock 数量级差距** —— 这意味着 locality 不是 nice-to-have，是 dominant lever。
+两个分布形状完全不同：chatbot unimodal 集中在 5–15s（人类节奏）；agentic bimodal —— **39% 的 gap 在 sub-second 里（autonomous tool-call mode）**，外加 13% > 30s 的长尾。**Agentic 的 sub-second mass 是 chatbot 没有的**，正是 dispatch coupling 激活的来源。
+
+**实测 regime**：在 unified（TTFT p90 = 7.3s）下，**73% 的 agentic turn 把系统推进闭环**（W_turn > T_external），chatbot 仅 32%。在 lmetric（15.7s）下 agentic 80%、**chatbot 也到 88%** —— lmetric 把 chatbot 自己也拖进闭环，这就是它在两种 workload 都 underperform 的根因。
+
+**结果**：lmetric 跑 600s trace 用 49 min wall-clock = **8x amplification**。**per-turn W 的小差异被放大成 wall-clock 数量级差距** —— locality 不是 nice-to-have，是 dominant lever。

 ---

--- a/PAPER_OUTLINE.md
+++ b/PAPER_OUTLINE.md
@@ -78,36 +78,62 @@ Trace 上 KV reuse 的分解：

 这是本文最依赖直觉的论证，单独成节。

-**直觉**：chatbot 里每个 turn 后人要读、想、打字，**外部时钟**控制下一个 turn 何时到达；agentic 里 LLM 一拿到 tool-call 结果立刻发下一个 request，**系统自己的速度决定下一个 turn 何时到达**。所以一个慢策略不仅让单请求变慢，还让 session 在系统里停留更久 → 并发 session 更多 → KV 竞争更激烈 → 每个 turn 更慢 —— 反馈环。
+**直觉**。每个 turn 之间有一段**外部 gap** `T_external`（chatbot 是人在读+想+打字、agentic 是 tool 执行）。下一 turn 在 `T_external` 之后到达。Little's Law: `L = Λ · N · (W_turn(L) + T_external)`。系统能不能避免反馈环，取决于 `W_turn` 是否**远小于** `T_external`：
+- 如果 `W_turn ≪ T_external`：session 停留时间被 `T_external` 主导，scheduler 调度速度的小变化几乎不动 `L`，系统在**开环 regime**；
+- 如果 `W_turn ≳ T_external`：`W_turn(L)` 这一项被 KV 竞争耦合到 L，Little's Law 变成 L 的隐式方程，**闭环 regime**，scheduler 上的 ε 退步被反馈环放大成几倍的 L*。

-**具体例子**：一个 coding agent 跑 20 turn 的任务。
+**Agentic 与 chatbot 不在二元区分上，而在 `T_external` 的分布上**。下面是 production trace 实测的 `T_external = next.start − prev.end` CDF（agentic = Qwen3-Coder, n=783k inter-turn gaps; chatbot = qwen3-max chat, n=42k gaps）：

- 快策略：每 turn 2s，session 共 40s，平均并发 10 个 session
- 慢策略（线性估算）：每 turn 3s，session 共 60s，应该并发 15 个
- 慢策略（实际）：15 个并发让每 turn 被推到 4s，session 80s，并发 20 个，turn 再推到 5s …… 直到撞墙或落到一个远更糟的新平衡
+![F3a Inter-turn external gap CDF — agentic vs chatbot](figs/f3a_inter_turn_gap.png)

-对照 chatbot：每 turn 后人读 30s。turn 从 2s 变 3s，session 从 32s 变 33s，3% 差距，几乎无反馈。
+| Metric | Agentic | Chatbot |
+|---|---:|---:|
+| p25 | 0.69s | 4.85s |
+| **p50** | **1.6s** | **7.2s** |
+| p90 | 44s | 15s |
+| p99 | 738s | 43s |
+| gap < 1s | **39%** | 4% |
+| gap < 5s | 67% | 29% |

-**形式化**。记 `Λ` = session 到达率（trace 给定），`N` = 每 session 的 turn 数，`W_turn(L)` = 单 turn 服务时间，是当前并发 session 数 `L` 的递增函数（并发越多、KV 竞争越激烈、`W_turn` 越大）。
+两个分布形状完全不同：
+- **Chatbot 是 unimodal**，5–15s 紧密集中（人类交互节奏）；
+- **Agentic 是 bimodal**：39% 的 gap < 1s（autonomous tool-call mode，chatbot 仅 4%）+ 13% 的 gap > 30s（session paused/abandoned，chatbot 仅 2%）。

-Chatbot 的 Little's Law:
-```
-L = Λ · N · (W_turn(L) + T_human)
-```
-被大常数 `T_human` 主导，`W_turn(L)` 的扰动几乎不动 `L`。
+**Agentic 的危险来自 sub-second tool-call mode** —— 这 39% 的 turn 几乎天然 `W_turn ≫ T_external`，dispatch coupling 必然激活；而 chatbot 没有这一段质量，要把 W_turn 推得很大才会进入闭环。
+
+**实测 regime 对照**：
+
+| Scheduler | TTFT p90 | Agentic frac(W_turn > T_ext) | Chatbot frac(W_turn > T_ext) |
+|---|---:|---:|---:|
+| unified | 7.3s | **73%** | 32% |
+| lmetric | 15.7s | **80%** | 88% |
+
+unified 在 agentic 上把 73% 的 turn 推进闭环，在 chatbot 上只有 32%。lmetric 在 agentic 上 80%、**chatbot 上也到 88%** —— lmetric 的 W_turn 大到把 chatbot 自己也推进闭环，这是 lmetric 在两种 workload 都 underperform 的一个直接根因。
+
+**具体例子**：一个 coding agent 跑 20 turn 的任务，假设 `T_external` 是 sub-second 模式（tool-call 0.5s）。
+
+- 快策略：`W_turn` = 2s，每 turn 总 2.5s，session 共 50s，平均并发 10 session
+- 慢策略（线性估算）：`W_turn` = 3s，每 turn 3.5s，session 70s，应并发 14
+- 慢策略（实际）：14 并发让 KV pool 更紧 → `W_turn` 推到 4s → session 90s → 18 并发 → `W_turn` 5s …… 反馈环放大到撞墙或落到一个远更糟的不动点
+
+**形式化**。记 `Λ` = session 到达率，`N` = 每 session turn 数，`W_turn(L)` = 单 turn 服务时间作为并发 L 的递增函数（并发越多、KV 竞争越激烈、`W_turn` 越大）。Little's Law:

-Agentic 的 Little's Law（`T_human ≈ 0`）:
 ```
-L = Λ · N · W_turn(L)
+L = Λ · N · (W_turn(L) + T_external)
 ```
-这是关于 `L` 的隐式方程。设策略变化让 `W_turn` 整体放大 `(1+ε)` 倍，小扰动分析得到：
+
+设策略变化让 `W_turn` 整体放大 `(1+ε)` 倍，小扰动分析得到不动点 L\* 的灵敏度：
+
 ```
-dL/dε|_{ε=0} = L* / (1 − Λ · N · W'_turn(L*))
+dL*/dε = L* · W_turn(L*) / [W_turn(L*) + T_external − Λ · N · W_turn(L*) · W'_turn(L*)]
 ```
-**分母接近 0** 时（系统接近 KV 饱和），放大系数发散。这就是为什么 lmetric 在 600s trace 上跑出 8x wall-clock 放大。
+
+注意两点：
+1. 分子 ∝ `W_turn / (W_turn + T_external)`：当 `T_external ≫ W_turn` 时灵敏度 → 0（开环）；当 `T_external → 0` 时灵敏度趋于其上界（闭环）。**所以 agentic 的 sub-second tool-call mass 把灵敏度推到上界，chatbot 的 5–15s mass 把灵敏度压低**。
+2. 分母 `... − Λ · N · W'_turn(L*)`：接近 KV 饱和时趋于 0，**任何调度退步在饱和附近都被无限放大** —— 这是 lmetric 在 600s trace 上跑出 8x wall-clock 的根因。

 **Figure 3: Dispatch coupling schematic** — `figs/f3_coupling_schematic.png` **🚧 TBD (CUSTOM DRAW)**
-> 需要新画一张示意图：上半 chatbot timeline（`system → T_human → system → T_human → ...`），下半 agentic timeline（`system → ε → system → ε → ...`），右侧叠一个反馈环箭头 `W_turn → Λ → L → W_turn`。适合用 TikZ / draw.io / matplotlib annotate。
+> 需要新画一张示意图：左半 timeline 对比（chatbot：`system → T_external (5–15s) → system`；agentic：`system → T_external (sub-second to long-tail) → system`），右半反馈环 `W_turn → L → W_turn`，标注两个 regime 的判别条件 `W_turn ≷ T_external`。

 ### §2.4 Takeaway