docs(experiments): E2 mid-run finding — D2 stays cold in KVC v2 too
Same pathological imbalance E1 showed reproduces in E2: D2 has zero bindings at 33% POSTs in. Root cause is structural, not a KVC v2 bug: all 50 Inferact sessions begin with identical "permissions instructions" boilerplate, so the converter assigns them identical first-block hash_ids. kv-aware policy's overlap term (lex-score position 0) makes any already-resident D dominate a fresh D unconditionally, and v2's migration only activates on admission rejects which never fire because D0/D1 KV pools have headroom. The H1 conclusion is qualified: KVC v2 helps per-request work (direct- to-D fast path) but does not rebalance D worker load on workloads with shared cross-session prefixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -60,17 +60,36 @@ This is *the baseline H1 needs* — it shows the KVC layer (E2) has something co
|
||||
|
||||
---
|
||||
|
||||
## 3. E2 results — pending
|
||||
## 3. E2 — in progress + an unexpected finding about D2
|
||||
|
||||
Background task `b0im1d48q`, launched 2026-05-12 01:48 UTC. Same subset, full KVC v2 stack (reset-on-success migration, direct-append threshold 8192), RDMA on, all other knobs identical to E1.
|
||||
Background task `b0im1d48q`, launched 2026-05-12 01:48 UTC. Mid-run snapshot at 16 minutes (33 % POSTs dispatched):
|
||||
|
||||
Expected differences:
|
||||
- Direct-to-D fast path engaged for turn≥1 requests → fewer P/D round-trips
|
||||
- Migration triggered when sessions hit D0/D1 saturation → D2 should see traffic
|
||||
- Lower TTFT p50 / latency mean
|
||||
- TTFT p99 still constrained by reseed slow-path (P re-prefill + mooncake transfer)
|
||||
| | D0 | D1 | D2 |
|
||||
|---|---:|---:|---:|
|
||||
| bindings so far | 248 | 267 | **0** |
|
||||
| GPU util (snapshot) | 0 % | 0 % | 0 % |
|
||||
| KV pool util (across run) | high | high | empty |
|
||||
|
||||
Will be filled in upon completion.
|
||||
**D2 receives zero traffic in E2 too, just like E1**. This is *not* the result we expected — H1 predicted that KVC's session-migration mechanism (reset-on-success blacklist with `migration_reject_threshold=3`) would route around the imbalance E1 showed. It doesn't.
|
||||
|
||||
### Root cause
|
||||
|
||||
`KvAwarePolicy.select` (policies.py:171-202) scores candidates by 4-tuple lex order `(overlap + α·sticky, sticky, -inflight, -assigned)`. The `overlap` term dominates: any D that has resident KV blocks matching the incoming request's `hash_ids` wins position 0.
|
||||
|
||||
In the Inferact `codex_swebenchpro` workload, **all 50 sessions begin with identical "permissions instructions" boilerplate** (the converter sees this as identical first-block content across trial 0..49). Our hash_id construction (sha256 over the token sequence per 24-token block, see `scripts/convert_inferact_to_trace.py`) therefore yields *identical block hashes across sessions* for the first ~50 blocks.
|
||||
|
||||
Concretely, when session N's turn 0 lands:
|
||||
- D0 / D1 already host previous sessions → their `state.resident` sets include those shared boilerplate hashes → `overlap > 0`
|
||||
- D2 has never been admitted → `state.resident[D2]` is empty → `overlap = 0`
|
||||
- D0/D1 tie at position 0; D2 always loses
|
||||
|
||||
The migration mechanism never triggers because D0/D1 have ample KV (peak token_usage ~0.86 in v2 historical reports) and never *reject* admission. No rejects → no `(session, D)` blacklist accumulation → no migration → D2 stays cold forever.
|
||||
|
||||
### Implication for H1
|
||||
|
||||
H1 is *not falsified*, but it is *qualified*: KVC v2 still improves over naive pd-disaggregation on per-request work (direct-to-D fast path skips P→D mooncake transfer for turn≥1 on the same D), but it does **not** automatically balance load across D workers when the workload has high cross-session prefix overlap. To realise the full theoretical benefit of 1P3D on this workload, the policy needs an explicit cold-D bonus, or a pre-warming step that seeds D2 with shared boilerplate at startup.
|
||||
|
||||
Full E2 metrics will be filled in upon completion (ETA ~22 min from snapshot).
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user