Gahow Wang 4a93096c1e Add PD_DISAGG_INVESTIGATION.md — living TODO for proving H1–H4
We don't have paper-grade evidence yet that PD-disagg fails in agentic.
MB1+MB2 corrected accounting puts phase-isolation cost-benefit on
PD-disagg's side; the only direct support is colleague's one data point
on a patched dash0 build (TTFT p50 62×, success 52%) and the f4b
geometric capacity argument. To close §3.2 properly we need fresh-venv
empirical replication PLUS system-level instrumentation that tells the
reviewer *which* component is the bottleneck — not just headline
latency.

This document tracks the four candidate failure hypotheses (H1 D-pool
capacity, H2 static-partition mismatch, H3 cache reuse + P-pool hotspot,
H4 end-to-end throughput loss), their current evidence status, and the
phased experiment plan to address each.

Key findings already recorded:
- Phase 0 TODO 0.1 (find standard PD-disagg deployment) is done — vLLM
  ships an official example at
  examples/online_serving/disaggregated_serving/mooncake_connector/
  with a kv_producer+kv_consumer launcher and a Mooncake-aware proxy
  that supports arbitrary P:D ratios via env vars. Per user direction,
  we will NOT polish PD-disagg policy ourselves; we use the official
  recipe as the "PD-disagg" baseline in §3.2 / §5.2.
- Phase 1 (MB5+3 combined: PD ratio sweep with D-pool occupancy logging)
  is the critical path. Designed to either confirm H1 with system
  breakdown evidence (D-pool ≥ 90% for ≥ 30% of trace + queue depth
  spike) or falsify it (some ratio matches 8C colo, in which case §3.2
  needs rewriting).
- D-pool occupancy timeline is the single most important new
  instrumentation — turns "PD-disagg is 10× worse" into "PD-disagg is
  10× worse BECAUSE the D pool sits at >90% for X% of the trace".

Configurations to run on dash1 8-GPU first:
  8C (colo baseline), 6P+2D, 4P+4D, 2P+6D  ×  3 reps  ×  w600 trace.

Open question still in the doc: vLLM 0.18.1 had an AttributeError on
self.bootstrap_server in kv_consumer mode when we hit it during MB2
sanity; likely the issue was bad kv_transfer_params from our side
(missing transfer_id, wrong field names), which we have since fixed.
Official proxy uses the same handshake we now have, so it should
just work. If not, single-line patch to initialize self.bootstrap_server
= None for consumer mode.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:24:31 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%