Gahow Wang 6243b78bba PD_DISAGG_RESULTS §6: session-affinity routing does not rescue PD
Swept session-affinity P routing (MB5_P_ROUTING=session) across all
four ratios on the metrics-fixed stack. Findings:

- Strictly worse than round-robin at every ratio. 4P+4D: round-robin
  100% vs session-affinity 36% completion.
- Success DECREASES monotonically as decode capacity grows
  (6P+2D 59% -> 4P+4D 36% -> 3P+5D 24% -> 2P+6D 19%) — refutes the
  "session prefill is faster so it needs more D" hypothesis.
- GPUs sit at ~0% utilization (2P+6D entirely idle) — the cluster
  stalls on KV-transfer/admission coordination, not compute. This is
  the deepest anti-PD argument: paid-for hardware does nothing while
  requests pile up; colocation keeps every GPU busy.
- Mechanism: session-affinity pins heavy multi-turn sessions onto
  single producers (producer hot-pinning, same pathology as sticky
  routing in the colocated §3.3 study); fewer producers -> worse
  concentration -> the monotonic decline. Failed transfers also pin
  producer KV (kv_load_failure_policy=fail), compounding to deadlock.

Verdict: neither ratio tuning nor routing policy rescues static
PD-disagg for this agentic workload — the failure is structural.

mb5_launch.sh: add 5P+3D / 3P+5D ratios for the sweep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 00:25:10 +08:00
Description
No description provided
48 MiB
Languages
Python 82.9%
Shell 17.1%