docs(experiments): E2 mid-run finding — D2 stays cold in KVC v2 too

Same pathological imbalance E1 showed reproduces in E2: D2 has zero bindings at 33% POSTs in. Root cause is structural, not a KVC v2 bug: all 50 Inferact sessions begin with identical "permissions instructions" boilerplate, so the converter assigns them identical first-block hash_ids. kv-aware policy's overlap term (lex-score position 0) makes any already-resident D dominate a fresh D unconditionally, and v2's migration only activates on admission rejects which never fire because D0/D1 KV pools have headroom. The H1 conclusion is qualified: KVC v2 helps per-request work (direct- to-D fast path) but does not rebalance D worker load on workloads with shared cross-session prefixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 02:08:00 +08:00
parent 631b2c8847
commit e3e5c45ed4
1 changed files with 27 additions and 8 deletions
--- a/docs/E1_E2_RESULTS_ZH.md
+++ b/docs/E1_E2_RESULTS_ZH.md
@@ -60,17 +60,36 @@ This is *the baseline H1 needs* — it shows the KVC layer (E2) has something co

 ---

-## 3. E2 results — pending
+## 3. E2 — in progress + an unexpected finding about D2

-Background task `b0im1d48q`, launched 2026-05-12 01:48 UTC. Same subset, full KVC v2 stack (reset-on-success migration, direct-append threshold 8192), RDMA on, all other knobs identical to E1.
+Background task `b0im1d48q`, launched 2026-05-12 01:48 UTC. Mid-run snapshot at 16 minutes (33 % POSTs dispatched):

-Expected differences:
- Direct-to-D fast path engaged for turn≥1 requests → fewer P/D round-trips
- Migration triggered when sessions hit D0/D1 saturation → D2 should see traffic
- Lower TTFT p50 / latency mean
- TTFT p99 still constrained by reseed slow-path (P re-prefill + mooncake transfer)
+| | D0 | D1 | D2 |
+|---|---:|---:|---:|
+| bindings so far | 248 | 267 | **0** |
+| GPU util (snapshot) | 0 % | 0 % | 0 % |
+| KV pool util (across run) | high | high | empty |

-Will be filled in upon completion.
+**D2 receives zero traffic in E2 too, just like E1**. This is *not* the result we expected — H1 predicted that KVC's session-migration mechanism (reset-on-success blacklist with `migration_reject_threshold=3`) would route around the imbalance E1 showed. It doesn't.
+
+### Root cause
+
+`KvAwarePolicy.select` (policies.py:171-202) scores candidates by 4-tuple lex order `(overlap + α·sticky, sticky, -inflight, -assigned)`. The `overlap` term dominates: any D that has resident KV blocks matching the incoming request's `hash_ids` wins position 0.
+
+In the Inferact `codex_swebenchpro` workload, **all 50 sessions begin with identical "permissions instructions" boilerplate** (the converter sees this as identical first-block content across trial 0..49). Our hash_id construction (sha256 over the token sequence per 24-token block, see `scripts/convert_inferact_to_trace.py`) therefore yields *identical block hashes across sessions* for the first ~50 blocks.
+
+Concretely, when session N's turn 0 lands:
+- D0 / D1 already host previous sessions → their `state.resident` sets include those shared boilerplate hashes → `overlap > 0`
+- D2 has never been admitted → `state.resident[D2]` is empty → `overlap = 0`
+- D0/D1 tie at position 0; D2 always loses
+
+The migration mechanism never triggers because D0/D1 have ample KV (peak token_usage ~0.86 in v2 historical reports) and never *reject* admission. No rejects → no `(session, D)` blacklist accumulation → no migration → D2 stays cold forever.
+
+### Implication for H1
+
+H1 is *not falsified*, but it is *qualified*: KVC v2 still improves over naive pd-disaggregation on per-request work (direct-to-D fast path skips P→D mooncake transfer for turn≥1 on the same D), but it does **not** automatically balance load across D workers when the workload has high cross-session prefix overlap. To realise the full theoretical benefit of 1P3D on this workload, the policy needs an explicit cold-D bonus, or a pre-warming step that seeds D2 with shared boilerplate at startup.
+
+Full E2 metrics will be filled in upon completion (ETA ~22 min from snapshot).

 ---