Adds dated, non-destructive correction notes to the contaminated PD-vs-colo
artifacts after the producer-eviction bug (`evict_blocks(sent_block_ids)` on
`finished_sending`, deployed over the "fresh" pip vLLM by
`scripts/deploy_vllm_patches.sh`) was found and gated behind
`VLLM_EVICT_SENT_BLOCKS` (default off).
PD_DISAGG_RESULTS.md top CORRECTION banner + §6 RETRACTED marker.
§6 (session-affinity hot-pin) was an `e13391e`
artifact under controlled concurrency; §3 RR, §4
TPOT win, §5 D-pool ceiling, §5.1 consumer crash
stand.
RESULTS_SUMMARY.md §4 confirm+refine note: clean ablation confirms
the D-pool capacity thesis and adds regime-
dependence.
pd_separation_analysis.md scoped caution: thesis confirmed; flags
only reuse-dependent figures for cross-check
(this study used a different stack).
figs/mb5/CORRECTION.md flags mb5_producer_hotspot.png as retracted;
§3 RR and §5 D-pool figures stand.
403 lines
20 KiB
Markdown
403 lines
20 KiB
Markdown
# PD-disaggregation under an agentic workload — does it work?
|
||
|
||
**Consolidated results doc.** Self-contained writeup of every PD-disagg
|
||
argument and experiment, with figures inline. For the live experiment TODO
|
||
list see [PD_DISAGG_INVESTIGATION.md](PD_DISAGG_INVESTIGATION.md).
|
||
|
||
Date: 2026-05-28 · Hardware: dash1, 8×GPU · Model: Qwen3-Coder-30B-A3B-Instruct
|
||
· vLLM 0.18.1 (V1, chunked-prefill on) · Mooncake 0.3.11 · Trace:
|
||
`w600_r0.0015_st30.jsonl` (1214 requests, agentic multi-turn).
|
||
|
||
---
|
||
|
||
## ⚠️ CORRECTION (2026-05-30) — read before §6
|
||
|
||
A contamination was found in the "fresh" vLLM used for the runs below.
|
||
`scripts/deploy_vllm_patches.sh` had copied our fork commit **`e13391e`** over the
|
||
pip-installed release; that commit calls `evict_blocks(sent_block_ids)` on
|
||
`finished_sending`, i.e. it **evicts a producer's prefix-cache blocks on every KV
|
||
transfer**. So a disaggregated producer could never keep a session's prefix warm,
|
||
*regardless of routing*. We have since gated that behind `VLLM_EVICT_SENT_BLOCKS`
|
||
(default off) and re-run everything on the corrected stack.
|
||
|
||
**Retracted (was a pure artifact of `e13391e`):**
|
||
- **All of §6** ("smarter routing does not save PD" / "session-affinity is
|
||
*strictly worse*" / "GPUs at ~0%" / "producer hot-pinning" / "producer prefix-hit
|
||
~0.2%"). On the corrected stack, **session-affinity recovers producer reuse to
|
||
full parity with colo (APC 71–82%)** — the collapse was the eviction bug starving
|
||
the very cache session-affinity exists to fill, not a routing pathology.
|
||
- The framing that PD reuse is "0% / fundamentally broken." PD reuses prefix
|
||
*exactly as well as colo* once routing is session-sticky.
|
||
|
||
**Still stands (independent of `e13391e`):**
|
||
- **§3 round-robin** numbers — RR sends consecutive turns to *different* producers,
|
||
so its ~0% prefix-hit is a **routing** artifact (not the eviction bug) and is
|
||
reproduced on the clean stack; RR PD still loses to 8C.
|
||
- **§4** PD wins TPOT (decode isolation) — robust.
|
||
- **§5.1** the consumer counter-underflow crash — a real, separate vLLM 0.18.1 bug.
|
||
- **§5** the D-pool capacity-ceiling mechanism (decode side pegs while prefill
|
||
strands) — real.
|
||
|
||
**Corrected verdict (the real reason PD loses on agentic).** It is *not* "routing
|
||
can't help." On the clean stack PD is **regime-dependent**: it *wins* at low
|
||
load / decode-heavy / low-reuse, and loses the **agentic corner** (high reuse +
|
||
short output + large context + high concurrency) through a structural crossover —
|
||
its static P:D split cannot simultaneously provide the prefix-cache capacity
|
||
(needs many producers) *and* the decode capacity (needs many decoders) that
|
||
agentic demands at once, while colo's elastic pool provides both. See the
|
||
three-axis ablation: **reuse** erodes the edge (1.57×→1.10×), **shape** rotates the
|
||
best ratio and is catastrophic at the prefill extreme, and **concurrency** tips PD
|
||
at N=64 (APC craters 71%→1.4%, TPS −30%) while colo scales cleanly.
|
||
|
||
→ Figures: [`figs/mb5_pd_ablation/`](../../figs/mb5_pd_ablation/) ·
|
||
data: [`analysis/mb5_pd_ablation/`](../../analysis/mb5_pd_ablation/) ·
|
||
the clean re-run of *this exact* w600 experiment (ratio-swept) is the Fig 4 anchor.
|
||
|
||
---
|
||
|
||
## TL;DR (verdict)
|
||
|
||
**No static prefill/decode split beats 8-way colocation (8C) on this agentic
|
||
workload.** Every disaggregated ratio we tried is dominated by 8C on the
|
||
metric the user actually feels (TTFT, end-to-end latency, request
|
||
completion), and the failure *moves* with the ratio:
|
||
|
||
- **D-heavy bottleneck** (6P+2D, 4P+4D): the decode pool saturates (peak
|
||
**99.6% / 97.5%**) while the prefill pool sits at **~30%** — half the
|
||
cluster's KV is stranded on the wrong side.
|
||
- **P-heavy bottleneck** (2P+6D): the 2 prefill instances can't keep up,
|
||
the prefill pool jams at **99.7%**, **872 requests** pile up in the queue
|
||
and **91% of requests never complete**.
|
||
- **8C** keeps a single elastic pool that absorbs whichever phase is hot at
|
||
the moment → steady utilization **34%**, **100% completion**, fastest
|
||
wall-clock, best p50/p90 latency.
|
||
|
||
PD-disagg *does* deliver the phase-isolation win we predicted in MB1 — its
|
||
**TPOT is 10–35× cleaner** — but that win is swamped by TTFT inflation and
|
||
request loss.
|
||
|
||
**Smarter routing does not save it (§6).** We added the "correct" PD policy —
|
||
session-affinity on the prefill side to recover prefix-cache reuse, load-balance
|
||
on decode — and swept it across all four ratios. It is *strictly worse* than
|
||
round-robin at every ratio (4P+4D: 100% → 36% completion), success *decreases*
|
||
as you add decode capacity (59→36→24→19%), and the GPUs sit at **~0%
|
||
utilization** — the cluster stalls on KV-transfer coordination, not compute.
|
||
Session-affinity reproduces the producer **hot-pinning** pathology from §3.3.
|
||
|
||
This is the empirical backing for the paper's claim: **agentic workloads
|
||
have time-varying P:D demand that no static partition can track; colocation
|
||
wins because its pool is elastic — and no routing knob rescues the static
|
||
split.** (H1 *and* H2 from the investigation doc, unified by one mechanism.)
|
||
|
||
---
|
||
|
||
## 1. Why this experiment exists
|
||
|
||
Earlier cost accounting (MB1 phase-interference, MB2 KV-transfer cost) showed
|
||
that on the **phase-isolation axis alone**, PD-disagg actually *wins*: it
|
||
removes prefill→decode interference, and the transfer cost is small relative
|
||
to the interference it avoids. So "PD-disagg is bad for agentic" could not be
|
||
argued from phase isolation — we needed a system-level experiment that
|
||
measures the whole picture (queueing, pool capacity, cache reuse), not just
|
||
the isolated phase cost.
|
||
|
||
See [analysis/mb1](../../analysis/mb1) and [analysis/mb2](../../analysis/mb2)
|
||
for that accounting. This doc is the system-level answer.
|
||
|
||
---
|
||
|
||
## 2. Setup
|
||
|
||
| | |
|
||
|---|---|
|
||
| Configs | `8C` (8× kv_both colo), `6P+2D`, `4P+4D`, `2P+6D` (prefill+decode split) |
|
||
| PD routing | stock **round-robin** on both P and D (vLLM official `mooncake_connector_proxy`) |
|
||
| Trace | `w600_r0.0015_st30.jsonl`, 1214 requests, agentic multi-turn |
|
||
| Reps | 1 (rep1) for this analysis; the 3-rep sweep confirmed run-to-run consistency before we converged on rep1 for iteration speed |
|
||
| KV instrumentation | V1 scheduler patched to dump per-request KV block allocation every 100 ms per EngineCore (see `instrument_kv_snapshot.py`) |
|
||
|
||
8C is the fair baseline: 8 colocated instances, replayer round-robins across
|
||
them directly (no proxy). PD configs route through the proxy.
|
||
|
||
---
|
||
|
||
## 3. Headline result — no PD ratio beats 8C
|
||
|
||
All numbers are rep1.
|
||
|
||
| Metric | **8C** | 6P+2D | 4P+4D | 2P+6D |
|
||
|---|---|---|---|---|
|
||
| **completion** | **100%** | 100% | 100% | **9%** 💀 |
|
||
| wall-clock (drain trace) | **2994 s** | 3419 s | 4171 s | 5762 s |
|
||
| prefix-cache hit | **19.4%** | 0% | 0% | 0% |
|
||
| TTFT mean | **18.0 s** | 44.8 s | 70.0 s | 106.8 s |
|
||
| TTFT p50 | **7.0 s** | 41.0 s | 56.4 s | 23.6 s |
|
||
| TTFT p90 | **53.1 s** | 86.7 s | 153.1 s | 498 s |
|
||
| E2E p50 | **10.8 s** | 44.5 s | 59.5 s | 26.3 s |
|
||
| E2E p90 | **83.3 s** | 91.8 s | 157.1 s | 499 s |
|
||
|
||

|
||
|
||
> ⚠️ **Read the percentiles with the completion rate.** Latency percentiles
|
||
> are computed over *successful* requests only. 2P+6D's "p99 = 577 s" covers
|
||
> just the 9% that finished — the other 91% never returned, so its real
|
||
> experience is far worse than any latency bar suggests.
|
||
|
||
8C wins p50 by **4×** and p90 decisively. The only metric where a PD config
|
||
edges 8C is E2E **p99** (6P+2D 148 s vs 8C 194 s) — and that is the flip side
|
||
of the next result.
|
||
|
||
---
|
||
|
||
## 4. The duality — PD wins TPOT, loses TTFT
|
||
|
||
PD-disagg delivers exactly the phase-isolation benefit MB1 predicted: with no
|
||
prefill stealing decode steps, **inter-token latency is dramatically cleaner.**
|
||
|
||
| TPOT | **8C** | 6P+2D | 4P+4D | 2P+6D |
|
||
|---|---|---|---|---|
|
||
| mean | 87 ms | 11 ms | 9 ms | 6 ms |
|
||
| p90 | 230 ms | 18 ms | 14 ms | 8 ms |
|
||
| p99 | **1129 ms** | **26 ms** | **20 ms** | **12 ms** |
|
||
|
||
PD's TPOT p99 is **10–35× lower** — once a request reaches a dedicated decode
|
||
instance it streams without interruption. 8C's 1.1 s TPOT p99 *is* the
|
||
chunked-prefill interference tax (decode steps occasionally stalled behind an
|
||
8k-token prefill chunk), consistent with MB1.
|
||
|
||
**But the win is local.** TTFT inflates 2.5–6× because every request now pays
|
||
P→D handoff + admission into a smaller, saturated decode pool. For this
|
||
workload's modest output lengths, TTFT dominates total time, so the TPOT win
|
||
never pays for itself. This is the cost/benefit imbalance made concrete:
|
||
phase isolation is real, but it is the wrong thing to optimize when the pool
|
||
is the binding constraint.
|
||
|
||
---
|
||
|
||
## 5. Root cause — per-role KV pool occupancy (the kill shot)
|
||
|
||
The cluster-average KV utilization is *misleading* and nearly hid the result:
|
||
|
||

|
||
|
||
6P+2D and 4P+4D look only ~42–46% utilized on cluster average — yet they have
|
||
128–152 requests queued. The average hides that **one pool is pegged while
|
||
the other idles.** Splitting the KV pool by role exposes it:
|
||
|
||

|
||
|
||
| Config | P-pool steady | D-pool steady | D-pool **peak** | binding side |
|
||
|---|---|---|---|---|
|
||
| 8C | — single shared pool — | 34% | 72% | none (elastic) |
|
||
| 6P+2D | 31% | **74%** | **99.6%** | **decode** |
|
||
| 4P+4D | 29% | **60%** | **97.5%** | **decode** |
|
||
| 2P+6D | **92%** | 95% | 96% | **prefill** (P jams first) |
|
||
|
||

|
||
|
||
**The mechanism, unified:**
|
||
|
||
- A static P:D split fixes the KV capacity on each side at deploy time.
|
||
- The agentic workload's instantaneous P:D demand *drifts* (bursts of new
|
||
sessions = prefill-heavy; long tool-call-driven turns = decode-heavy).
|
||
- Whichever side is undersized *for the current phase* saturates and
|
||
back-pressures the whole pipeline, while the other side's KV sits stranded.
|
||
- 6P+2D / 4P+4D → decode side too small → D-pool hits ~100%, prefilled
|
||
requests queue for a decode slot → TTFT explodes (this is **H1**).
|
||
- 2P+6D → prefill side too small → P-pool hits ~100%, requests can't even
|
||
start → 872 queued, 91% dropped.
|
||
- **8C colocation has no partition**: prefill and decode share one pool, so
|
||
the pool elastically reallocates to whichever phase is hot. Steady
|
||
utilization stays at 34% with 100% completion.
|
||
|
||
This is **H1 (D-pool capacity ceiling)** and **H2 (static-partition
|
||
mismatch)** turning out to be the *same* phenomenon seen from two ratios.
|
||
|
||
### 5.1 The same pressure crashes consumers (a vLLM 0.18.1 fragility)
|
||
|
||
D-pool saturation doesn't just slow things down — under this workload it
|
||
**crashes the decode instances**. The exact chain, from the 6P+2D consumer
|
||
logs:
|
||
|
||
1. D-pool fills to **97.2%** (the capacity ceiling above).
|
||
2. A large request needs its KV pulled to the consumer, but the transfer
|
||
fails: `Mooncake transfer engine returned -1` (observed on a **112,793-token**
|
||
request — agentic sessions have very long multi-turn contexts, and the
|
||
pool had no room).
|
||
3. `kv_load_failure_policy=fail` fails that request — by itself recoverable.
|
||
4. **But** the failure path computes `PromptTokenStats.local_cache_hit =
|
||
num_cached + recomputed − num_external_computed`, which goes **negative**
|
||
when the external transfer exceeded the scheduler's cached count.
|
||
5. `loggers.record()` calls `Counter.inc(negative)` → prometheus_client raises
|
||
*"Counters can only be incremented by non-negative amounts"* → the
|
||
**EngineCore dies**.
|
||
6. Once the consumer's engine is dead, **every** subsequent request fails.
|
||
|
||
The signature is a cliff, not a slope: in the session-routing 6P+2D run, all
|
||
80 successes landed in the first ~110 s, then **zero** of the next ~2,800 s.
|
||
This same intermittent consumer death is almost certainly why the
|
||
round-robin 6P+2D reps varied so wildly (100% / 56% / 80%) — the consumer
|
||
crashed at different points in each rep.
|
||
|
||
**Two takeaways:** (a) PD-disagg under agentic context lengths hits KV-transfer
|
||
failures that colocation never does (8C never transfers — it prefills and
|
||
decodes in the same pool); (b) vLLM 0.18.1's failure handling amplifies one
|
||
failed request into a total collapse. We patched the counter underflow
|
||
(`instrument_kv_snapshot.py`, clamp to ≥ 0) so a transfer failure stays a
|
||
single failed request, which is required to compare routing arms fairly in §6.
|
||
|
||
---
|
||
|
||
## 6. The routing handicap — and whether smarter routing rescues PD
|
||
|
||
> 🛑 **RETRACTED (2026-05-30) — this entire section is an artifact of `e13391e`.**
|
||
> The session-affinity runs below were starved by the producer-eviction bug, so
|
||
> they could never collect prefix-cache reuse. On the corrected stack
|
||
> session-affinity reaches **APC parity with colo (71–82%)** and does *not* stall
|
||
> at 0% GPU util. The real mechanism is the capacity/concurrency crossover, not a
|
||
> routing pathology — see the CORRECTION banner at the top and
|
||
> [`figs/mb5_pd_ablation/`](../../figs/mb5_pd_ablation/). Text kept below for the
|
||
> record only.
|
||
|
||
Every PD config above shows **prefix-cache hit = 0%**, versus 8C's 19%. That
|
||
is not fundamental to disaggregation — it is the stock proxy round-robining
|
||
the **prefill** side: consecutive turns of one agentic session land on
|
||
*different* producers, so each turn re-prefills the whole conversation from
|
||
scratch. That both inflates TTFT and piles extra load on the prefill pool
|
||
(directly worsening the 2P+6D collapse).
|
||
|
||
The correct PD scheduling policy (as the design argues): **P should be chosen
|
||
by session affinity** (reuse the producer's prefix cache) while **D is chosen
|
||
by load balance** (decode KV is freshly transferred per turn, so D gains
|
||
nothing from affinity). We added this as an env-gated mode in the proxy
|
||
(`MB5_P_ROUTING=session`, consistent hash on `X-Session-Id`; D stays
|
||
round-robin) and swept it across **all four P:D ratios**. All runs below are on
|
||
the **metrics-fixed stack** (§5.1 clamp), so consumers no longer crash and
|
||
failures are genuine KV-transfer/capacity failures — an apples-to-apples
|
||
comparison of the two routing policies.
|
||
|
||
### 6.1 Session-affinity does NOT rescue PD — it makes it worse
|
||
|
||
| Config | rr success | **session success** | rr TTFT mean | direction |
|
||
|---|---|---|---|---|
|
||
| 6P+2D | 73% | **59%** | 89 s | session worse |
|
||
| 4P+4D | **100%** | **36%** | 71 s | session much worse |
|
||
| 3P+5D | — | **24%** | — | ↓ |
|
||
| 2P+6D | 9%* | **19%** | — | ↓ |
|
||
|
||
\* rr 2P+6D from the original sweep (prefill-bound, 9%).
|
||
|
||
Two results, both decisive:
|
||
|
||
1. **At every ratio, session-affinity is worse than round-robin.** The most
|
||
damning point is 4P+4D, where round-robin completes **100%** but
|
||
session-affinity completes only **36%**.
|
||
2. **Session-affinity success *decreases monotonically* as you add decode
|
||
capacity** (59% → 36% → 24% → 19% going 6P+2D → 4P+4D → 3P+5D → 2P+6D).
|
||
Adding D does not help — it hurts. This refutes the natural hypothesis
|
||
("session prefill is faster, so it needs more D").
|
||
|
||
### 6.2 The smoking gun: GPUs sit at ~0% utilization
|
||
|
||
During the session-affinity runs the cluster is **not compute-bound — it is
|
||
stalled**. Sampled GPU utilization mid-run:
|
||
|
||
```
|
||
session 3P+5D : 0 0 100 0 0 0 0 0 (1 of 8 GPUs doing anything)
|
||
session 2P+6D : 0 0 0 0 0 0 0 0 (entirely idle)
|
||
```
|
||
|
||
Requests are piling up (transfer failures climbing into the hundreds) while
|
||
**the hardware you paid for does nothing.** This is the deepest argument
|
||
against PD-disagg for this workload: the binding constraint is KV-pool
|
||
capacity and P→D transfer coordination, not FLOPs. Colocation (8C) keeps every
|
||
GPU busy because prefill and decode interleave in one elastic pool with no
|
||
cross-instance handoff.
|
||
|
||
### 6.3 Why session-affinity backfires (mechanism)
|
||
|
||
Session-affinity pins **all turns of a session to one producer**. Agentic
|
||
sessions are heavy-tailed (a few very long multi-turn sessions — recall the
|
||
112k-token request in §5.1). Sticky routing concentrates those heavy sessions
|
||
onto individual producers, whose KV pools fill and stall — the **same
|
||
hot-pinning pathology as sticky routing in the colocated study (§3.3)**, now on
|
||
the producer side. Round-robin avoids it by spreading each session's turns
|
||
across producers. With *fewer* producers (2P+6D), the concentration is worse,
|
||
which is exactly why success keeps dropping as the ratio shifts D-ward. A
|
||
failed transfer also pins the producer's KV (it is not freed on
|
||
`kv_load_failure_policy=fail`), compounding the stall until the pipeline
|
||
deadlocks at ~0% utilization.
|
||
|
||
The per-producer KV-pool timelines make the hot-pinning direct. At the **same
|
||
4P+4D ratio**, round-robin holds all four producers within **1 percentage
|
||
point** of each other (spread 0pp, CV 0.01); session-affinity blows the spread
|
||
open to **49 percentage points** (one producer pegged at ~93% while another
|
||
sits at 45%, CV 0.25 — a 25× jump in load imbalance):
|
||
|
||

|
||
|
||
Producer-side prefix-cache hit in the degraded state is ~0.2% (vs round-robin's
|
||
~5%) — session-affinity never even gets to *collect* the cache-reuse benefit it
|
||
was supposed to provide, because the producers it concentrates load onto are
|
||
thrashing.
|
||
|
||
### 6.4 Verdict on routing
|
||
|
||
Neither **ratio tuning** (§3, no static split beats 8C) nor **routing policy**
|
||
(§6, session-affinity is strictly worse and ratio-tuning it only makes it
|
||
worse) rescues static PD-disaggregation for this agentic workload. The failure
|
||
is **structural**: a static prefill/decode partition cannot track time-varying
|
||
P:D demand, the cross-instance KV handoff adds a capacity-coupled failure mode
|
||
absent in colocation, and the routing knob that helps colocation (affinity)
|
||
actively hurts disaggregation (producer hotspots). Colocation wins on
|
||
completion, latency, *and* hardware utilization.
|
||
|
||
---
|
||
|
||
## 7. Caveats / honesty
|
||
|
||
- **Single rep** for this analysis. The earlier 3-rep round-robin sweep
|
||
varied for 6P+2D (rep1 100% / rep2 56% / rep3 80%) — but §5.1 showed that
|
||
variance was the *consumer-crash bug*, not genuine load behavior. On the
|
||
metrics-fixed stack, round-robin 6P+2D completes a stable **73%** (the
|
||
unpatched "100% rep1" in §3's table was a lucky no-crash run). 8C and rr
|
||
4P+4D are tight run-to-run. The qualitative ranking is robust.
|
||
- **Latency percentiles count successes only** (see §3 warning). For failing
|
||
configs the latency bars *understate* the damage — and for the session-
|
||
affinity runs, which stall at ~0% GPU util, the latency of the few survivors
|
||
is especially unrepresentative.
|
||
- **Routing fairness addressed.** §6 tests the "correct" PD routing
|
||
(session-affinity P + load-balanced D) across all ratios; it does not rescue
|
||
PD, so the round-robin baseline in §3 is not an unfair handicap on the
|
||
conclusion.
|
||
- **Session-affinity ratio sweep used near-final partials** (runs were stopped
|
||
once the monotonic-decline trend and 0% GPU util were unambiguous, to save
|
||
GPU time). Exact final percentages would shift by a few points; the trend
|
||
and the stall are not in doubt.
|
||
- Trace is a single agentic workload; conclusions are about *this* class of
|
||
workload (sub-second tool-call cadence, multi-turn sessions), not all LLM
|
||
serving.
|
||
|
||
---
|
||
|
||
## 8. Reproduce
|
||
|
||
```bash
|
||
# from repo root, after microbench/fresh_setup/deploy.sh dash1
|
||
# 1. round-robin baseline sweep (1 rep)
|
||
ssh dash1 'CONFIGS="8C 6P+2D 4P+4D 2P+6D" REPS=1 RUN_TAG=<tag> \
|
||
bash /home/admin/cpfs/wjh/agentic-kv-fresh/scripts/mb5_run.sh'
|
||
|
||
# 2. reduce on dash1 (numpy-only; handles the multi-GB snapshot dirs)
|
||
ssh dash1 '.venv/bin/python scripts/aggregate_mb5.py --sweep-root mb5_runs \
|
||
--tag <tag> --configs "8C 6P+2D 4P+4D 2P+6D" --reps 1 \
|
||
--reduce-to mb5_runs/reduced_<tag>.json'
|
||
|
||
# 3. pull the compact JSON, render figures locally
|
||
scp dash1:.../mb5_runs/reduced_<tag>.json analysis/mb5/
|
||
.venv/bin/python microbench/fresh_setup/aggregate_mb5.py \
|
||
--from-reduced analysis/mb5/reduced_<tag>.json --out-dir figs/mb5
|
||
|
||
# session-affinity arm: prefix the run with MB5_P_ROUTING=session
|
||
```
|