docs(experiments): E2 mid-run finding — D2 stays cold in KVC v2 too

Same pathological imbalance E1 showed reproduces in E2: D2 has zero
bindings at 33% POSTs in. Root cause is structural, not a KVC v2 bug:
all 50 Inferact sessions begin with identical "permissions
instructions" boilerplate, so the converter assigns them identical
first-block hash_ids. kv-aware policy's overlap term (lex-score
position 0) makes any already-resident D dominate a fresh D
unconditionally, and v2's migration only activates on admission
rejects which never fire because D0/D1 KV pools have headroom. The
H1 conclusion is qualified: KVC v2 helps per-request work (direct-
to-D fast path) but does not rebalance D worker load on workloads
with shared cross-session prefixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
tim
2026-05-12 02:08:00 +08:00
parent 631b2c8847
commit e3e5c45ed4

View File

@@ -60,17 +60,36 @@ This is *the baseline H1 needs* — it shows the KVC layer (E2) has something co
---
## 3. E2 results — pending
## 3. E2 — in progress + an unexpected finding about D2
Background task `b0im1d48q`, launched 2026-05-12 01:48 UTC. Same subset, full KVC v2 stack (reset-on-success migration, direct-append threshold 8192), RDMA on, all other knobs identical to E1.
Background task `b0im1d48q`, launched 2026-05-12 01:48 UTC. Mid-run snapshot at 16 minutes (33 % POSTs dispatched):
Expected differences:
- Direct-to-D fast path engaged for turn≥1 requests → fewer P/D round-trips
- Migration triggered when sessions hit D0/D1 saturation → D2 should see traffic
- Lower TTFT p50 / latency mean
- TTFT p99 still constrained by reseed slow-path (P re-prefill + mooncake transfer)
| | D0 | D1 | D2 |
|---|---:|---:|---:|
| bindings so far | 248 | 267 | **0** |
| GPU util (snapshot) | 0 % | 0 % | 0 % |
| KV pool util (across run) | high | high | empty |
Will be filled in upon completion.
**D2 receives zero traffic in E2 too, just like E1**. This is *not* the result we expected — H1 predicted that KVC's session-migration mechanism (reset-on-success blacklist with `migration_reject_threshold=3`) would route around the imbalance E1 showed. It doesn't.
### Root cause
`KvAwarePolicy.select` (policies.py:171-202) scores candidates by 4-tuple lex order `(overlap + α·sticky, sticky, -inflight, -assigned)`. The `overlap` term dominates: any D that has resident KV blocks matching the incoming request's `hash_ids` wins position 0.
In the Inferact `codex_swebenchpro` workload, **all 50 sessions begin with identical "permissions instructions" boilerplate** (the converter sees this as identical first-block content across trial 0..49). Our hash_id construction (sha256 over the token sequence per 24-token block, see `scripts/convert_inferact_to_trace.py`) therefore yields *identical block hashes across sessions* for the first ~50 blocks.
Concretely, when session N's turn 0 lands:
- D0 / D1 already host previous sessions → their `state.resident` sets include those shared boilerplate hashes → `overlap > 0`
- D2 has never been admitted → `state.resident[D2]` is empty → `overlap = 0`
- D0/D1 tie at position 0; D2 always loses
The migration mechanism never triggers because D0/D1 have ample KV (peak token_usage ~0.86 in v2 historical reports) and never *reject* admission. No rejects → no `(session, D)` blacklist accumulation → no migration → D2 stays cold forever.
### Implication for H1
H1 is *not falsified*, but it is *qualified*: KVC v2 still improves over naive pd-disaggregation on per-request work (direct-to-D fast path skips P→D mooncake transfer for turn≥1 on the same D), but it does **not** automatically balance load across D workers when the workload has high cross-session prefix overlap. To realise the full theoretical benefit of 1P3D on this workload, the policy needs an explicit cold-D bonus, or a pre-warming step that seeds D2 with shared boilerplate at startup.
Full E2 metrics will be filled in upon completion (ETA ~22 min from snapshot).
---