v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip

Extends exp(c) (dispatch ablation, 1 round-robin policy) to the full 5-policy routing comparison, both modes on the SAME ttp trace (807 reqs, fresh vLLM/arm, dash0 8xH20). Confirms exp(c)'s prediction and finds something stronger: the dispatch mode FLIPS which policy wins. - thinktime helps every policy but helps LPWL most (TTFT p90 -40%, E2E mean -31% vs -3..-16% for the rest): tracets bursts punish prefill-spreading. - Ranking flip: tracets -> LPWL only ties unified_ab on TTFT p90 and is 3rd on E2E mean; thinktime -> LPWL is 1st on both (TTFT p90 -31%, best TPOT/balance, zero knobs) vs the tuned unified+A+B. - => benchmark agentic routing with thinktime; tracets' burst artifact erases LPWL's advantage. Caveat n=1: tracets ranking is run-sensitive (does not reproduce dash1 lpwl_5policy_600s.md), the thinktime advantage is the robust signal (appears in both environments). README + grouped-bar fig (figs/exp_d_policy_dispatch.png) + bench_report summaries in results/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 20:59:18 +08:00
parent 68f21bef23
commit 9b6091fe6e
5 changed files with 1788 additions and 0 deletions
--- a/v2/exp_d_policy_dispatch/README.md
+++ b/v2/exp_d_policy_dispatch/README.md
@@ -0,0 +1,114 @@
+# exp (d) — 5-policy routing under `tracets` vs `thinktime`
+
+exp (c) showed the **dispatch mode** changes measured performance for a single
+round-robin policy, and predicted: *"a cache-aware policy (LPWL) would lower the
+latencies and likely **widen** the thinktime advantage."* exp (d) tests that with
+the full routing comparison — and finds something stronger: **the dispatch mode
+flips which policy wins.**
+
+**Question.** Does the parameter-free LPWL still beat the tuned `unified+A+B`
+baseline once we benchmark with the *faithful* `thinktime` load instead of the
+`tracets` burst artifact?
+
+## Setup
+
+5 routing policies, each its own **fresh vLLM (cold APC)** on dash0 8×H20,
+Qwen3-Coder-30B-A3B, via `scripts/b3_isolated_policy.sh`. **Both dispatch modes
+run on the *same* trace** `traces/w600_r0.0015_st30_first600s_ttp.jsonl` (807
+reqs, 274 sessions) — the only variable is `REPLAY_DISPATCH_MODE`
+(`tracets` ignores the `time_to_parent_chat` field, `thinktime` consumes it).
+Analyzer: `scripts/bench_report.py` (summaries in `results/`).
+
+- `leastwork` — **LPWL**, parameter-free (`pending_prefill + max(0, input−cache_hit)`)
+- `unified_ab` — unified hybrid, tuned A+B′ (`of=1.3, lmw=0.01`)
+- `unified_def` — unified hybrid, defaults (`of=2.0, lmw=0.0`)
+- `lmetric` — P_tokens × BS, no affinity
+- `sticky` — hard session affinity
+
+## Result (ms; `figs/exp_d_policy_dispatch.png`)
+
+| policy | mode | TTFT p90 | E2E mean | E2E p90 | E2E p99 | TPOT p90 | APC | req-bal |
+|---|---|---:|---:|---:|---:|---:|---:|---:|
+| **LPWL** | tracets | 11099 | 9827 | 25366 | 93929 | 33 | 0.650 | **1.49×** |
+| **LPWL** | **thinktime** | **6713** | **6788** | **17635** | 69946 | **18** | 0.676 | 1.94× |
+| unified+A+B | tracets | 10783 | 8531 | 22063 | 75419 | 21 | 0.667 | 1.54× |
+| unified+A+B | thinktime | 9736 | 7131 | 18690 | **63788** | 19 | 0.676 | 2.16× |
+| unified default | tracets | 12997 | 8366 | 22819 | 82257 | 20 | 0.693 | 1.56× |
+| unified default | thinktime | 11268 | 7975 | 24096 | 72334 | 22 | 0.693 | 2.91× |
+| LMetric | tracets | 16492 | 10775 | 27791 | 99231 | 39 | 0.495 | 2.19× |
+| LMetric | thinktime | 15607 | 9902 | 27819 | 73672 | 30 | 0.483 | 2.10× |
+| sticky | tracets | 15236 | 10139 | 27974 | 82362 | 31 | 0.693 | 2.06× |
+| sticky | thinktime | 14838 | 8663 | 24966 | 70933 | 24 | 0.694 | 2.48× |
+
+### Finding 1 — `thinktime` helps every policy, but helps **LPWL the most**
+
+Per-policy `tracets`→`thinktime` change (negative = thinktime better):
+
+| policy | ΔTTFT p90 | ΔE2E mean | ΔTPOT p90 |
+|---|---:|---:|---:|
+| **LPWL** | **−40%** | **−31%** | **−45%** |
+| unified+A+B | −10% | −16% | −10% |
+| unified default | −13% | −5% | +10% |
+| LMetric | −5% | −8% | −23% |
+| sticky | −3% | −15% | −23% |
+
+`tracets` collapses the inter-turn think-time to ~0 (exp c), manufacturing bursts
+→ peak concurrency → KV pressure → preemption. Those bursts punish exactly the
+policy that spreads prefill thinly across hosts (LPWL keeps the tightest request
+balance, 1.49×), because under a burst the spread sacrifices locality without the
+slack to amortize it. Remove the artifact and LPWL's prefill-aware placement pays.
+
+### Finding 2 — the dispatch mode **flips the cross-policy ranking**
+
+- **TTFT p90:** `tracets` → `unified_ab (10.8s) ≈ LPWL (11.1s)` — LPWL only *ties*,
+  even slightly behind. `thinktime` → **LPWL (6.7s)** < unified_ab (9.7s): LPWL is
+  first, **−31%** vs the tuned baseline.
+- **E2E mean:** `tracets` → unified_def (8.4s) < unified_ab (8.5s) < **LPWL (9.8s)**
+  — LPWL is *3rd, behind both unified variants*. `thinktime` → **LPWL (6.8s)** <
+  unified_ab (7.1s) < unified_def (8.0s): LPWL is **first**.
+
+So under artificial `tracets` bursts the parameter-free policy looks tied-or-worse;
+under the faithful `thinktime` load it is the clear winner on TTFT and E2E, at
+zero knobs and best balance.
+
+## Conclusion
+
+**Benchmark agentic routing with `thinktime`. Under it, the parameter-free LPWL is
+the best of the five policies** — TTFT p90 −31%, E2E mean −5% / p90 −6%, best TPOT,
+tightest balance vs the *tuned* `unified+A+B` — and the `tracets` burst artifact is
+precisely what erases that advantage (it even drops LPWL to 3rd on E2E). This both
+confirms exp (c)'s prediction and is independent evidence for the GPU-hit-first
+routing story: faithful load rewards keeping the active working set GPU-resident.
+
+## Caveats
+
+- **n = 1 per arm.** The `tracets` ranking here does **not** reproduce the earlier
+  dash1 `analysis/lpwl_5policy_600s.md` (which saw LPWL win TTFT p90 −31% *in
+  tracets*); on dash0 `tracets` it is a tie. i.e. **`tracets` rankings are
+  run/harness-sensitive** — the robust signal is the `thinktime` advantage, which
+  appears in *both* environments. Repeat ×3 to bound noise.
+- LPWL's one persistent weak spot is **E2E p99** (thinktime 69.9s vs unified_ab
+  63.8s) — the structural HEAVY+ >50k decode tail, identical across policies, not
+  routing-fixable (see `lpwl_5policy_600s.md` κ-ablation).
+- `thinktime` advantage is a capacity-slack effect; under saturation the modes
+  converge (exp c, N=6).
+
+## Repro
+```bash
+# 1. annotate the full trace with time_to_parent_chat (dash0; once)
+python scripts/add_ttp_streaming.py 051315-051317.jsonl 051315-051317-ttp.jsonl \
+    051315-051317-raw.jsonl
+# 2. resample (same seed reproduces traces/w600_r0.0015_st30.jsonl + the ttp field;
+#    first600s = timestamp<600 filter)
+python scripts/sample_trace.py --input 051315-051317-ttp.jsonl \
+    --output traces/w600_r0.0015_st30_ttp.jsonl \
+    --window-seconds 600 --sample-ratio 0.0015 --max-single-turn-ratio 0.30 --seed 42
+# 3. run both modes x 5 policies (~3.5 h, fresh vLLM/arm)
+TRACE_FILE=traces/w600_r0.0015_st30_first600s_ttp.jsonl \
+    bash microbench/connector_tax/cache_sweep/run_5policy_both_modes.sh
+# 4. report + plot
+python scripts/bench_report.py --root outputs/policy5_600s_thinktime_<date> \
+    --json v2/exp_d_policy_dispatch/results/thinktime.json \
+    leastwork unified_ab unified_def lmetric sticky
+python v2/exp_d_policy_dispatch/plot.py
+```
--- a/v2/exp_d_policy_dispatch/plot.py
+++ b/v2/exp_d_policy_dispatch/plot.py
@@ -0,0 +1,68 @@
+"""exp (d): 5-policy routing under tracets vs thinktime dispatch.
+
+Shows the ranking FLIP: under the faithful `thinktime` load the parameter-free
+LPWL (leastwork) is the clear winner, but under `tracets` (think-collapse bursts)
+its advantage disappears (it ties unified_ab on TTFT p90 and *loses* on E2E mean).
+
+Reads the two bench_report summaries; writes v2/figs/exp_d_policy_dispatch.png.
+Usage: python v2/exp_d_policy_dispatch/plot.py
+"""
+import json
+import os
+
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+
+HERE = os.path.dirname(__file__)
+TC = json.load(open(os.path.join(HERE, "results/tracets.json")))
+TT = json.load(open(os.path.join(HERE, "results/thinktime.json")))
+
+# canonical order: LPWL first; pretty labels
+ARMS = ["leastwork", "unified_ab", "unified_def", "lmetric", "sticky"]
+LABEL = {"leastwork": "LPWL\n(leastwork)", "unified_ab": "unified\n+A+B",
+         "unified_def": "unified\ndefault", "lmetric": "LMetric", "sticky": "sticky"}
+C_TC, C_TT = "#d62728", "#2ca02c"  # tracets red / thinktime green (match exp_c)
+
+
+def panel(ax, key, sub, title, ylab):
+    tc = [TC[a][key][sub] / 1000.0 for a in ARMS]   # ms -> s
+    tt = [TT[a][key][sub] / 1000.0 for a in ARMS]
+    x = range(len(ARMS))
+    w = 0.38
+    b1 = ax.bar([i - w / 2 for i in x], tc, w, label="tracets (burst)", color=C_TC)
+    b2 = ax.bar([i + w / 2 for i in x], tt, w, label="thinktime (faithful)", color=C_TT)
+    for bars in (b1, b2):
+        for r in bars:
+            ax.text(r.get_x() + r.get_width() / 2, r.get_height(),
+                    f"{r.get_height():.1f}", ha="center", va="bottom", fontsize=8)
+    ax.set_xticks(list(x)); ax.set_xticklabels([LABEL[a] for a in ARMS], fontsize=9)
+    ax.set_ylabel(ylab); ax.set_title(title, fontsize=11)
+    ax.grid(axis="y", alpha=.3)
+    ax.set_ylim(0, max(tc + tt) * 1.18)
+    # mark LPWL-thinktime as the winner (lowest green) in each panel
+    ax.annotate("LPWL wins\nunder thinktime", xy=(0 + w / 2, tt[0]),
+                xytext=(0.9, max(tc + tt) * 0.86), fontsize=8.5, color=C_TT,
+                ha="left", arrowprops=dict(arrowstyle="->", color=C_TT, lw=1.3))
+    return b1, b2
+
+
+fig, (axL, axR) = plt.subplots(1, 2, figsize=(11.2, 4.6))
+panel(axL, "ttft_ms", "p90", "TTFT p90 (lower = better)", "TTFT p90 (s)")
+panel(axR, "e2e_ms", "mean", "E2E mean (lower = better)", "E2E mean (s)")
+axL.legend(loc="upper left", fontsize=9)
+fig.suptitle("5-policy routing: dispatch mode flips the ranking — "
+             "LPWL is best under faithful thinktime, only ties/loses under tracets bursts",
+             fontsize=11.5)
+fig.tight_layout(rect=(0, 0, 1, 0.95))
+out = os.path.join(HERE, "..", "figs", "exp_d_policy_dispatch.png")
+fig.savefig(out, dpi=140)
+print("wrote", os.path.normpath(out))
+
+# also print the deltas the README cites
+print("\npolicy        TTFTp90 tc->tt    E2Emean tc->tt")
+for a in ARMS:
+    t1, t2 = TC[a]["ttft_ms"]["p90"], TT[a]["ttft_ms"]["p90"]
+    e1, e2 = TC[a]["e2e_ms"]["mean"], TT[a]["e2e_ms"]["mean"]
+    print(f"{a:<13} {t1/1000:5.1f}->{t2/1000:4.1f}s ({(t2-t1)/t1:+.0%})   "
+          f"{e1/1000:5.1f}->{e2/1000:4.1f}s ({(e2-e1)/e1:+.0%})")
--- a/v2/exp_d_policy_dispatch/results/thinktime.json
+++ b/v2/exp_d_policy_dispatch/results/thinktime.json
@@ -0,0 +1,803 @@
+{
+  "leastwork": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 986.1941225528717,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 3043.454534307026,
+      "p50": 681.8344180064742,
+      "p90": 6712.89858900127,
+      "p99": 41146.725983999204
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 17.12884673518703,
+      "p50": 7.770131949655479,
+      "p90": 17.997618232737178,
+      "p99": 133.81680370757084
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 6787.973176127951,
+      "p50": 2026.8339599715546,
+      "p90": 17635.302426991984,
+      "p99": 69945.72682998842
+    },
+    "throughput": {
+      "decode_tps": 234.00362537409853,
+      "prefill_tps": 8660.302069020001,
+      "total_tps": 8894.305694394101,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8540739
+    },
+    "apc": 0.6756355919409787,
+    "per_worker": {
+      "0": {
+        "n": 96,
+        "decode_tps": 48.631399136561754,
+        "prefill_tps": 812.7547930676582,
+        "ttft_p90_ms": 5368.347445008112,
+        "gpu_util_mean": 48.6875,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 111,
+        "decode_tps": 28.45180209284375,
+        "prefill_tps": 954.9580335787387,
+        "ttft_p90_ms": 3442.4916800053325,
+        "gpu_util_mean": 40.479166666666664,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 99,
+        "decode_tps": 35.558922120953866,
+        "prefill_tps": 901.7494422882478,
+        "ttft_p90_ms": 5583.948273997521,
+        "gpu_util_mean": 48.395833333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 88,
+        "decode_tps": 20.717016592141224,
+        "prefill_tps": 1149.215934349922,
+        "ttft_p90_ms": 6448.1909119931515,
+        "gpu_util_mean": 38.020833333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 124,
+        "decode_tps": 38.884839326290034,
+        "prefill_tps": 891.8842445776638,
+        "ttft_p90_ms": 4944.760143000167,
+        "gpu_util_mean": 40.020833333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 110,
+        "decode_tps": 20.013301183451194,
+        "prefill_tps": 1581.959336729224,
+        "ttft_p90_ms": 27228.53080899222,
+        "gpu_util_mean": 78.19791666666667,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 64,
+        "decode_tps": 25.779914337947165,
+        "prefill_tps": 1114.0737658787832,
+        "ttft_p90_ms": 18414.893322013086,
+        "gpu_util_mean": 49.833333333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 115,
+        "decode_tps": 15.966430583909537,
+        "prefill_tps": 1253.7065185497638,
+        "ttft_p90_ms": 9039.336649002507,
+        "gpu_util_mean": 39.5625,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {},
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 1.9375,
+      "ttft_p90_ratio": 7.909541500751002,
+      "gpu_util_ratio": 2.0567123287671234,
+      "gpu_util_min": 38.020833333333336,
+      "gpu_util_max": 78.19791666666667
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 192.46459313074845,
+          "p50": 177.03324498143047,
+          "p90": 313.57523999758996,
+          "p99": 553.8838730135467
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 772.5742901807313,
+          "p50": 677.829442982329,
+          "p90": 1460.6262099987362,
+          "p99": 2101.3274399738293
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 2004.694984432952,
+          "p50": 1127.2326559992507,
+          "p90": 5081.04542500223,
+          "p99": 9901.586207997752
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 9134.502951365745,
+          "p50": 2167.4920289951842,
+          "p90": 28926.44312098855,
+          "p99": 49472.52169801504
+        }
+      }
+    }
+  },
+  "unified_ab": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 986.5525379180908,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 3592.357064001708,
+      "p50": 676.4678099716548,
+      "p90": 9736.127940996084,
+      "p99": 42370.66501099616
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 13.200466578008895,
+      "p50": 7.819523662692517,
+      "p90": 19.090397550442486,
+      "p99": 133.40408908212945
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 7131.188424004758,
+      "p50": 2037.0979200233705,
+      "p90": 18689.829077018658,
+      "p99": 63787.50272799516
+    },
+    "throughput": {
+      "decode_tps": 233.91861166055818,
+      "prefill_tps": 8640.029468666471,
+      "total_tps": 8873.948080327029,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8523843
+    },
+    "apc": 0.6762772765819173,
+    "per_worker": {
+      "0": {
+        "n": 58,
+        "decode_tps": 29.088161954921237,
+        "prefill_tps": 930.9397773565431,
+        "ttft_p90_ms": 13273.868343996583,
+        "gpu_util_mean": 44.989583333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 98,
+        "decode_tps": 24.162930086120934,
+        "prefill_tps": 1018.370498666148,
+        "ttft_p90_ms": 4365.537890000269,
+        "gpu_util_mean": 38.90625,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 110,
+        "decode_tps": 35.40713612040818,
+        "prefill_tps": 965.8167845888297,
+        "ttft_p90_ms": 4610.747697995976,
+        "gpu_util_mean": 52.114583333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 102,
+        "decode_tps": 20.719626390233998,
+        "prefill_tps": 1126.5056419045684,
+        "ttft_p90_ms": 10947.632670984603,
+        "gpu_util_mean": 41.703125,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 99,
+        "decode_tps": 44.64435324746667,
+        "prefill_tps": 911.5449663712324,
+        "ttft_p90_ms": 4116.690531984204,
+        "gpu_util_mean": 42.671875,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 110,
+        "decode_tps": 29.724722072971574,
+        "prefill_tps": 918.851216898154,
+        "ttft_p90_ms": 4543.632891000016,
+        "gpu_util_mean": 40.864583333333336,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 125,
+        "decode_tps": 28.516474205589404,
+        "prefill_tps": 1522.1155917037186,
+        "ttft_p90_ms": 25507.55575299263,
+        "gpu_util_mean": 76.203125,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 105,
+        "decode_tps": 21.655207582846195,
+        "prefill_tps": 1245.884991177276,
+        "ttft_p90_ms": 20629.490054008784,
+        "gpu_util_mean": 47.276041666666664,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {
+      "lmetric_fallback": 389,
+      "affinity": 418
+    },
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 2.1551724137931036,
+      "ttft_p90_ratio": 6.196131468910353,
+      "gpu_util_ratio": 1.9586345381526105,
+      "gpu_util_min": 38.90625,
+      "gpu_util_max": 76.203125
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 448.3382160131283,
+          "p50": 179.28761898656376,
+          "p90": 323.1771159917116,
+          "p99": 5748.067840992007
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 1455.8712874500252,
+          "p50": 685.6210659898352,
+          "p90": 1802.9974120145198,
+          "p99": 32571.255193004617
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 2672.607777120579,
+          "p50": 1117.918328003725,
+          "p90": 5214.129884989234,
+          "p99": 22190.210508997552
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 9472.201524545819,
+          "p50": 2150.3282230114564,
+          "p90": 28876.64386598044,
+          "p99": 48314.48572798399
+        }
+      }
+    }
+  },
+  "unified_def": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 979.5575842857361,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4037.2454534798544,
+      "p50": 695.2703970018774,
+      "p90": 11267.881545994896,
+      "p99": 46221.317757997895
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 16.476541787288614,
+      "p50": 8.307468241425875,
+      "p90": 21.768670571627954,
+      "p99": 200.26358073773736
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 7974.606969135101,
+      "p50": 2098.1516239990015,
+      "p90": 24096.24872301356,
+      "p99": 72334.40188399982
+    },
+    "throughput": {
+      "decode_tps": 235.5890084484137,
+      "prefill_tps": 8253.263646460364,
+      "total_tps": 8488.852654908778,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8084547
+    },
+    "apc": 0.6929610772463206,
+    "per_worker": {
+      "0": {
+        "n": 96,
+        "decode_tps": 39.88024862110074,
+        "prefill_tps": 791.1724766697671,
+        "ttft_p90_ms": 5825.010653992649,
+        "gpu_util_mean": 47.68586387434555,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 55,
+        "decode_tps": 17.82028977166094,
+        "prefill_tps": 910.7254277965683,
+        "ttft_p90_ms": 16298.377383005572,
+        "gpu_util_mean": 39.2565445026178,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 98,
+        "decode_tps": 27.174512685142215,
+        "prefill_tps": 1043.4608606959093,
+        "ttft_p90_ms": 9739.183520985534,
+        "gpu_util_mean": 40.83769633507853,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 103,
+        "decode_tps": 24.518211471470025,
+        "prefill_tps": 1003.2661844138513,
+        "ttft_p90_ms": 6705.797864007764,
+        "gpu_util_mean": 33.50785340314136,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 102,
+        "decode_tps": 49.593817432818994,
+        "prefill_tps": 689.9175820202374,
+        "ttft_p90_ms": 2474.3239340023138,
+        "gpu_util_mean": 45.246073298429316,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 112,
+        "decode_tps": 20.50823792523468,
+        "prefill_tps": 1346.1127974027988,
+        "ttft_p90_ms": 23553.059853002196,
+        "gpu_util_mean": 50.109947643979055,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 81,
+        "decode_tps": 19.51697409799575,
+        "prefill_tps": 990.0816609032532,
+        "ttft_p90_ms": 5961.234248999972,
+        "gpu_util_mean": 38.717277486910994,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 160,
+        "decode_tps": 36.57671644299036,
+        "prefill_tps": 1478.5266565579789,
+        "ttft_p90_ms": 17912.180206010817,
+        "gpu_util_mean": 85.15183246073299,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {
+      "lmetric_fallback": 349,
+      "affinity": 458
+    },
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 2.909090909090909,
+      "ttft_p90_ratio": 9.518988006919619,
+      "gpu_util_ratio": 2.5412500000000002,
+      "gpu_util_min": 33.50785340314136,
+      "gpu_util_max": 85.15183246073299
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 594.1550390875225,
+          "p50": 196.222682017833,
+          "p90": 338.4021449892316,
+          "p99": 7637.84466200741
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 1386.6929373560054,
+          "p50": 662.5233909871895,
+          "p90": 1772.5210430216976,
+          "p99": 19121.71271801344
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 3761.512416864031,
+          "p50": 1186.4990000030957,
+          "p90": 7436.603061010828,
+          "p99": 37502.096537995385
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 9973.751859232492,
+          "p50": 2084.2301140073687,
+          "p90": 34646.72368601896,
+          "p99": 51783.358982007485
+        }
+      }
+    }
+  },
+  "lmetric": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 1036.9893975257874,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4942.361280256006,
+      "p50": 1195.667241991032,
+      "p90": 15606.655231997138,
+      "p99": 46217.127193987835
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 19.707597229545165,
+      "p50": 9.35281297406689,
+      "p90": 30.177805961172382,
+      "p99": 232.18400578116416
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 9901.839828112516,
+      "p50": 3177.2723750036675,
+      "p90": 27819.4430010044,
+      "p99": 73672.06387300394
+    },
+    "throughput": {
+      "decode_tps": 222.5413302687709,
+      "prefill_tps": 13134.949144609054,
+      "total_tps": 13357.490474877826,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 13620803
+    },
+    "apc": 0.48270240989877555,
+    "per_worker": {
+      "0": {
+        "n": 121,
+        "decode_tps": 40.13348651326501,
+        "prefill_tps": 1973.9218210754154,
+        "ttft_p90_ms": 23894.41591600189,
+        "gpu_util_mean": 90.75247524752476,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 128,
+        "decode_tps": 44.98117349250917,
+        "prefill_tps": 1626.6328315647543,
+        "ttft_p90_ms": 5918.853377981577,
+        "gpu_util_mean": 64.96039603960396,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 109,
+        "decode_tps": 26.800659742819484,
+        "prefill_tps": 1578.2861463241723,
+        "ttft_p90_ms": 13917.768498009536,
+        "gpu_util_mean": 58.306930693069305,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 99,
+        "decode_tps": 19.435107097610242,
+        "prefill_tps": 1683.1715002723502,
+        "ttft_p90_ms": 16737.5574040052,
+        "gpu_util_mean": 59.16831683168317,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 116,
+        "decode_tps": 19.955845305048445,
+        "prefill_tps": 1884.7752972820501,
+        "ttft_p90_ms": 11347.276910004439,
+        "gpu_util_mean": 50.36138613861386,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 61,
+        "decode_tps": 12.497716978516857,
+        "prefill_tps": 1726.7611455549827,
+        "ttft_p90_ms": 31680.082703998778,
+        "gpu_util_mean": 55.93069306930693,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 88,
+        "decode_tps": 38.23472071614312,
+        "prefill_tps": 1208.0914259963265,
+        "ttft_p90_ms": 9533.787049993407,
+        "gpu_util_mean": 51.62871287128713,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 85,
+        "decode_tps": 20.50262042285856,
+        "prefill_tps": 1453.3089765390037,
+        "ttft_p90_ms": 14970.007644995349,
+        "gpu_util_mean": 51.757425742574256,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {},
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 2.098360655737705,
+      "ttft_p90_ratio": 5.352402007768976,
+      "gpu_util_ratio": 1.8020249680526887,
+      "gpu_util_min": 50.36138613861386,
+      "gpu_util_max": 90.75247524752476
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 511.51012201982036,
+          "p50": 255.4193850082811,
+          "p90": 471.22472297633067,
+          "p99": 3532.1444049768616
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 1010.5527848093863,
+          "p50": 818.2104199950118,
+          "p90": 1878.1264800054487,
+          "p99": 4416.228823014535
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 3164.034748000338,
+          "p50": 2636.801838991232,
+          "p90": 7400.190736021614,
+          "p99": 9636.447697004769
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 15215.938255342222,
+          "p50": 12060.85875100689,
+          "p90": 36602.47571900254,
+          "p99": 52271.21993701439
+        }
+      }
+    }
+  },
+  "sticky": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 994.9787130355835,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4455.946148958436,
+      "p50": 713.0627470032778,
+      "p90": 14838.208375993418,
+      "p99": 43174.81458699331
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 19.138733289320065,
+      "p50": 8.24416923684399,
+      "p90": 23.769559945071954,
+      "p99": 184.6952650922511
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 8663.490226920512,
+      "p50": 2352.715140004875,
+      "p90": 24966.471978026675,
+      "p99": 70932.61348700617
+    },
+    "throughput": {
+      "decode_tps": 231.93762537485247,
+      "prefill_tps": 8105.277926394779,
+      "total_tps": 8337.215551769632,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8064579
+    },
+    "apc": 0.6937194318219754,
+    "per_worker": {
+      "0": {
+        "n": 156,
+        "decode_tps": 44.672312500428745,
+        "prefill_tps": 1949.5271351907093,
+        "ttft_p90_ms": 20576.009418989997,
+        "gpu_util_mean": 93.18041237113403,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 114,
+        "decode_tps": 44.75372127725863,
+        "prefill_tps": 929.0429914624127,
+        "ttft_p90_ms": 5498.717762995511,
+        "gpu_util_mean": 53.08247422680412,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 88,
+        "decode_tps": 29.785561853462614,
+        "prefill_tps": 904.2113044360427,
+        "ttft_p90_ms": 12234.77461998118,
+        "gpu_util_mean": 49.628865979381445,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 98,
+        "decode_tps": 29.982550992458386,
+        "prefill_tps": 1018.2680159145942,
+        "ttft_p90_ms": 16286.48554199026,
+        "gpu_util_mean": 44.123711340206185,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 110,
+        "decode_tps": 37.69427376549181,
+        "prefill_tps": 949.1017120546454,
+        "ttft_p90_ms": 6709.773182024946,
+        "gpu_util_mean": 45.7680412371134,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 99,
+        "decode_tps": 19.964246209244884,
+        "prefill_tps": 980.7747514746083,
+        "ttft_p90_ms": 14065.780322009232,
+        "gpu_util_mean": 36.324742268041234,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 79,
+        "decode_tps": 11.21531531660107,
+        "prefill_tps": 682.5723918534845,
+        "ttft_p90_ms": 4579.089447972365,
+        "gpu_util_mean": 22.288659793814432,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 63,
+        "decode_tps": 13.869643459906332,
+        "prefill_tps": 691.7796240082818,
+        "ttft_p90_ms": 18229.593775991816,
+        "gpu_util_mean": 30.762886597938145,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {},
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 2.4761904761904763,
+      "ttft_p90_ratio": 4.493471825081102,
+      "gpu_util_ratio": 4.180619796484737,
+      "gpu_util_min": 22.288659793814432,
+      "gpu_util_max": 93.18041237113403
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 827.0193562525294,
+          "p50": 197.0047799986787,
+          "p90": 507.2060489910655,
+          "p99": 19187.98109301133
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 2624.659966896439,
+          "p50": 736.4085000008345,
+          "p90": 3899.43698499701,
+          "p99": 33760.123436979484
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 3807.600332329692,
+          "p50": 1086.1541359918192,
+          "p90": 9912.624888995197,
+          "p99": 40516.03257699753
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 9766.785228673292,
+          "p50": 2521.5582190139685,
+          "p90": 34039.37866198248,
+          "p99": 47948.314540000865
+        }
+      }
+    }
+  }
+}
--- a/v2/exp_d_policy_dispatch/results/tracets.json
+++ b/v2/exp_d_policy_dispatch/results/tracets.json
@@ -0,0 +1,803 @@
+{
+  "leastwork": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 1045.237051486969,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4029.99802739754,
+      "p50": 856.9921070011333,
+      "p90": 11099.306205986068,
+      "p99": 43400.520397000946
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 21.48754069144944,
+      "p50": 8.545071840088175,
+      "p90": 33.14273249998223,
+      "p99": 221.47291811146448
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 9827.03743343642,
+      "p50": 2474.214674992254,
+      "p90": 25365.627319028135,
+      "p99": 93929.44298699149
+    },
+    "throughput": {
+      "decode_tps": 220.78532297692573,
+      "prefill_tps": 8826.999566150593,
+      "total_tps": 9047.784889127519,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 9226307
+    },
+    "apc": 0.6495987515101673,
+    "per_worker": {
+      "0": {
+        "n": 95,
+        "decode_tps": 26.681028920976477,
+        "prefill_tps": 996.0917463831219,
+        "ttft_p90_ms": 5435.623251018114,
+        "gpu_util_mean": 40.30392156862745,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 94,
+        "decode_tps": 53.13244485640239,
+        "prefill_tps": 951.7659162433571,
+        "ttft_p90_ms": 10786.369787005242,
+        "gpu_util_mean": 61.27450980392157,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 89,
+        "decode_tps": 19.963892372849116,
+        "prefill_tps": 1100.5666115294052,
+        "ttft_p90_ms": 4944.386984978337,
+        "gpu_util_mean": 36.745098039215684,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 117,
+        "decode_tps": 23.37842881214075,
+        "prefill_tps": 1155.5378737117585,
+        "ttft_p90_ms": 6188.670104980702,
+        "gpu_util_mean": 40.76470588235294,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 108,
+        "decode_tps": 34.993018997907186,
+        "prefill_tps": 947.086594941991,
+        "ttft_p90_ms": 4642.632269999012,
+        "gpu_util_mean": 40.19607843137255,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 87,
+        "decode_tps": 10.190989675352892,
+        "prefill_tps": 1130.268964651918,
+        "ttft_p90_ms": 9265.34449501196,
+        "gpu_util_mean": 29.41176470588235,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 130,
+        "decode_tps": 20.925396749842136,
+        "prefill_tps": 1646.6073390256745,
+        "ttft_p90_ms": 38816.55501498608,
+        "gpu_util_mean": 81.27450980392157,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 87,
+        "decode_tps": 31.520122591454786,
+        "prefill_tps": 899.0745196633663,
+        "ttft_p90_ms": 7304.075189982541,
+        "gpu_util_mean": 42.26960784313726,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {},
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 1.4942528735632183,
+      "ttft_p90_ratio": 8.3608937252733,
+      "gpu_util_ratio": 2.7633333333333336,
+      "gpu_util_min": 29.41176470588235,
+      "gpu_util_max": 81.27450980392157
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 2123.6411421857115,
+          "p50": 202.10654201218858,
+          "p90": 478.5369329911191,
+          "p99": 39781.0776779952
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 1228.5320352953054,
+          "p50": 757.9952410014812,
+          "p90": 1679.417210019892,
+          "p99": 15248.28791298205
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 2886.291041083525,
+          "p50": 1301.9832599966321,
+          "p90": 5246.098520990927,
+          "p99": 39812.045788014075
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 10579.37216416889,
+          "p50": 2986.336703004781,
+          "p90": 34044.874527986394,
+          "p99": 51031.28803099389
+        }
+      }
+    }
+  },
+  "unified_ab": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 1081.3928244113922,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4003.8645795703064,
+      "p50": 745.6592369999271,
+      "p90": 10783.10890001012,
+      "p99": 46727.02033401583
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 18.129335403553664,
+      "p50": 8.004697278213117,
+      "p90": 20.508462421730655,
+      "p99": 188.8185092436804
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 8531.231209817679,
+      "p50": 2227.301309001632,
+      "p90": 22062.78157699853,
+      "p99": 75419.32771002757
+    },
+    "throughput": {
+      "decode_tps": 213.4034874196719,
+      "prefill_tps": 8110.4912128244405,
+      "total_tps": 8323.894700244113,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8770627
+    },
+    "apc": 0.66690479182639,
+    "per_worker": {
+      "0": {
+        "n": 119,
+        "decode_tps": 36.19868665330458,
+        "prefill_tps": 1627.2449384503814,
+        "ttft_p90_ms": 28329.616097005783,
+        "gpu_util_mean": 85.30805687203791,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 94,
+        "decode_tps": 30.323855734708484,
+        "prefill_tps": 977.303507238684,
+        "ttft_p90_ms": 5139.202910999302,
+        "gpu_util_mean": 41.165876777251185,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 79,
+        "decode_tps": 13.535321919642842,
+        "prefill_tps": 1030.6661694437062,
+        "ttft_p90_ms": 16363.982771988958,
+        "gpu_util_mean": 35.014218009478675,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 122,
+        "decode_tps": 25.46530675842895,
+        "prefill_tps": 900.9917376974744,
+        "ttft_p90_ms": 5133.929038012866,
+        "gpu_util_mean": 35.85781990521327,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 101,
+        "decode_tps": 39.54437188287811,
+        "prefill_tps": 731.5889121334044,
+        "ttft_p90_ms": 8783.158237987664,
+        "gpu_util_mean": 45.843601895734594,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 109,
+        "decode_tps": 32.10951581715914,
+        "prefill_tps": 842.2332564447569,
+        "ttft_p90_ms": 4199.806818010984,
+        "gpu_util_mean": 37.32701421800948,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 89,
+        "decode_tps": 8.69804181021798,
+        "prefill_tps": 1146.4917946675243,
+        "ttft_p90_ms": 11112.522551004076,
+        "gpu_util_mean": 30.95734597156398,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 94,
+        "decode_tps": 27.528386843331813,
+        "prefill_tps": 853.9708967485094,
+        "ttft_p90_ms": 10584.918729990022,
+        "gpu_util_mean": 47.426540284360186,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {
+      "lmetric_fallback": 394,
+      "affinity": 413
+    },
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 1.5443037974683544,
+      "ttft_p90_ratio": 6.745456951856325,
+      "gpu_util_ratio": 2.7556644213104713,
+      "gpu_util_min": 30.95734597156398,
+      "gpu_util_max": 85.30805687203791
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 473.4928183806124,
+          "p50": 193.41183800133877,
+          "p90": 326.81434298865497,
+          "p99": 5278.063865000149
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 1676.7254021373014,
+          "p50": 733.51754702162,
+          "p90": 1942.2162729897536,
+          "p99": 28329.616097005783
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 2520.2562936371373,
+          "p50": 1149.4361000077333,
+          "p90": 5139.202910999302,
+          "p99": 26739.575799991144
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 11092.085469873233,
+          "p50": 2945.923403982306,
+          "p90": 38718.26310700271,
+          "p99": 51830.85186799872
+        }
+      }
+    }
+  },
+  "unified_def": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 912.5732414722443,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4275.594404255811,
+      "p50": 757.1689730102662,
+      "p90": 12997.265826008515,
+      "p99": 47988.61391301034
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 18.24678916181055,
+      "p50": 8.282395397119393,
+      "p90": 19.536432251223843,
+      "p99": 127.01842809143604
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 8365.99611452138,
+      "p50": 2119.7480160044506,
+      "p90": 22818.839199026115,
+      "p99": 82257.18197401147
+    },
+    "throughput": {
+      "decode_tps": 252.8816203592563,
+      "prefill_tps": 8854.578057717206,
+      "total_tps": 9107.459678076462,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8080451
+    },
+    "apc": 0.6931166371592755,
+    "per_worker": {
+      "0": {
+        "n": 86,
+        "decode_tps": 48.0925782233062,
+        "prefill_tps": 878.7174152798008,
+        "ttft_p90_ms": 9217.135582002811,
+        "gpu_util_mean": 49.02808988764045,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 114,
+        "decode_tps": 36.67650822853761,
+        "prefill_tps": 962.1616765613936,
+        "ttft_p90_ms": 5333.489734999603,
+        "gpu_util_mean": 43.674157303370784,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 75,
+        "decode_tps": 21.302399759867672,
+        "prefill_tps": 1132.1228292133994,
+        "ttft_p90_ms": 16419.932407996384,
+        "gpu_util_mean": 44.235955056179776,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 94,
+        "decode_tps": 39.351362025545676,
+        "prefill_tps": 918.7383126064411,
+        "ttft_p90_ms": 14089.503638009774,
+        "gpu_util_mean": 57.30898876404494,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 103,
+        "decode_tps": 23.649608622297535,
+        "prefill_tps": 1200.1765449894706,
+        "ttft_p90_ms": 12823.167912021745,
+        "gpu_util_mean": 43.82022471910113,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 104,
+        "decode_tps": 25.253863419028313,
+        "prefill_tps": 1067.8562067279715,
+        "ttft_p90_ms": 18659.113589994377,
+        "gpu_util_mean": 52.449438202247194,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 114,
+        "decode_tps": 27.64600018218629,
+        "prefill_tps": 1278.7061322523903,
+        "ttft_p90_ms": 7502.1790039900225,
+        "gpu_util_mean": 42.62359550561798,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 117,
+        "decode_tps": 30.90929989848701,
+        "prefill_tps": 1416.0989400863393,
+        "ttft_p90_ms": 15775.390956026968,
+        "gpu_util_mean": 79.19101123595506,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {
+      "lmetric_fallback": 353,
+      "affinity": 454
+    },
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 1.56,
+      "ttft_p90_ratio": 3.4984812040696216,
+      "gpu_util_ratio": 1.8579148543561357,
+      "gpu_util_min": 42.62359550561798,
+      "gpu_util_max": 79.19101123595506
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 187.7879224041902,
+          "p50": 171.855135995429,
+          "p90": 318.49737899028696,
+          "p99": 406.7676870035939
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 2014.1915133893583,
+          "p50": 737.6636109838728,
+          "p90": 2057.0135610178113,
+          "p99": 29051.10613000579
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 3298.1720407103326,
+          "p50": 1321.936836000532,
+          "p90": 7205.449934001081,
+          "p99": 36468.73455500463
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 10874.266077009788,
+          "p50": 2567.718550999416,
+          "p90": 35562.78318798286,
+          "p99": 61205.50292698317
+        }
+      }
+    }
+  },
+  "lmetric": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 1077.4112372398376,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 5152.329462028203,
+      "p50": 1274.1917690145783,
+      "p90": 16492.156354011968,
+      "p99": 46248.138958995696
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 24.0370380963572,
+      "p50": 10.534365684851883,
+      "p90": 38.95731354601127,
+      "p99": 231.34709527787183
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 10774.921962922257,
+      "p50": 3460.944951977581,
+      "p90": 27791.26176200225,
+      "p99": 99231.47636200883
+    },
+    "throughput": {
+      "decode_tps": 214.19212276939402,
+      "prefill_tps": 12337.071064946676,
+      "total_tps": 12551.263187716071,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 13292099
+    },
+    "apc": 0.49518609291339905,
+    "per_worker": {
+      "0": {
+        "n": 140,
+        "decode_tps": 40.62422823074453,
+        "prefill_tps": 2047.015033600442,
+        "ttft_p90_ms": 13910.764692991506,
+        "gpu_util_mean": 92.98571428571428,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 89,
+        "decode_tps": 23.129515582036454,
+        "prefill_tps": 1463.9739641483652,
+        "ttft_p90_ms": 17954.478895000648,
+        "gpu_util_mean": 60.319047619047616,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 92,
+        "decode_tps": 18.14256184117442,
+        "prefill_tps": 1663.4864553589516,
+        "ttft_p90_ms": 15653.674011002295,
+        "gpu_util_mean": 55.94761904761905,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 92,
+        "decode_tps": 43.91637878329201,
+        "prefill_tps": 1313.127198881315,
+        "ttft_p90_ms": 13397.495551005704,
+        "gpu_util_mean": 54.385714285714286,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 134,
+        "decode_tps": 21.903428500019192,
+        "prefill_tps": 1412.6889969137978,
+        "ttft_p90_ms": 21995.840935007436,
+        "gpu_util_mean": 51.51428571428571,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 88,
+        "decode_tps": 16.934109622562403,
+        "prefill_tps": 1474.5493132872787,
+        "ttft_p90_ms": 11013.645827013534,
+        "gpu_util_mean": 42.55238095238095,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 108,
+        "decode_tps": 30.606697665860118,
+        "prefill_tps": 1651.413070981298,
+        "ttft_p90_ms": 18613.971301994752,
+        "gpu_util_mean": 52.67619047619048,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 64,
+        "decode_tps": 18.935202543704882,
+        "prefill_tps": 1310.817031775228,
+        "ttft_p90_ms": 15090.364522009622,
+        "gpu_util_mean": 41.904761904761905,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {},
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 2.1875,
+      "ttft_p90_ratio": 1.997144386199301,
+      "gpu_util_ratio": 2.2189772727272725,
+      "gpu_util_min": 41.904761904761905,
+      "gpu_util_max": 92.98571428571428
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 1789.4509492392028,
+          "p50": 262.9711049958132,
+          "p90": 1572.630943992408,
+          "p99": 24139.729285001522
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 1697.68317032391,
+          "p50": 920.4712719947565,
+          "p90": 2029.556789988419,
+          "p99": 22497.115491016302
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 3424.9239484553045,
+          "p50": 2699.253859987948,
+          "p90": 6799.459913017927,
+          "p99": 24401.87052200781
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 14137.37210560736,
+          "p50": 11013.645827013534,
+          "p90": 35319.35577199329,
+          "p99": 51099.66781400726
+        }
+      }
+    }
+  },
+  "sticky": {
+    "n_total": 807,
+    "n_ok": 807,
+    "window_s": 925.9680500030518,
+    "ttft_ms": {
+      "n": 807,
+      "mean": 4914.133386887214,
+      "p50": 906.6202020039782,
+      "p90": 15236.451414995827,
+      "p99": 46771.370519010816
+    },
+    "tpot_ms": {
+      "n": 806,
+      "mean": 22.62391467634138,
+      "p50": 9.728358783056292,
+      "p90": 30.957536839455965,
+      "p99": 231.30005976865783
+    },
+    "e2e_ms": {
+      "n": 807,
+      "mean": 10139.474540275223,
+      "p50": 2597.957960999338,
+      "p90": 27973.595037998166,
+      "p99": 82362.0547579776
+    },
+    "throughput": {
+      "decode_tps": 249.22350182518656,
+      "prefill_tps": 8743.631057219865,
+      "total_tps": 8992.854559045052,
+      "total_output_tokens": 230773,
+      "total_new_prefill_tokens": 8096323
+    },
+    "apc": 0.6925138424965755,
+    "per_worker": {
+      "0": {
+        "n": 136,
+        "decode_tps": 55.41875877880546,
+        "prefill_tps": 1434.2957081463262,
+        "ttft_p90_ms": 11807.82909199479,
+        "gpu_util_mean": 65.71270718232044,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "1": {
+        "n": 129,
+        "decode_tps": 38.47432964872015,
+        "prefill_tps": 1709.658340797809,
+        "ttft_p90_ms": 23309.77585897199,
+        "gpu_util_mean": 86.77900552486187,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "2": {
+        "n": 108,
+        "decode_tps": 47.74786775834683,
+        "prefill_tps": 1144.2727424520835,
+        "ttft_p90_ms": 30574.395572999492,
+        "gpu_util_mean": 63.50828729281768,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "3": {
+        "n": 78,
+        "decode_tps": 25.291369394357414,
+        "prefill_tps": 1245.5040970325047,
+        "ttft_p90_ms": 26070.17321899184,
+        "gpu_util_mean": 51.87845303867403,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "4": {
+        "n": 132,
+        "decode_tps": 40.39124244068328,
+        "prefill_tps": 941.3586138281199,
+        "ttft_p90_ms": 5989.128820016049,
+        "gpu_util_mean": 43.0828729281768,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "5": {
+        "n": 84,
+        "decode_tps": 22.365782491017637,
+        "prefill_tps": 1030.0301398054248,
+        "ttft_p90_ms": 14970.723142003408,
+        "gpu_util_mean": 41.43646408839779,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "6": {
+        "n": 66,
+        "decode_tps": 9.593203566765315,
+        "prefill_tps": 669.7250515262986,
+        "ttft_p90_ms": 13777.997018012684,
+        "gpu_util_mean": 20.994475138121548,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      },
+      "7": {
+        "n": 74,
+        "decode_tps": 9.940947746490457,
+        "prefill_tps": 568.7863636312983,
+        "ttft_p90_ms": 7027.451686997665,
+        "gpu_util_mean": 18.50828729281768,
+        "gpu_util_max": 100.0,
+        "gpu_mem_max_mb": 89575.0
+      }
+    },
+    "decisions": {},
+    "gpu_captured": true,
+    "spread": {
+      "n_ratio": 2.0606060606060606,
+      "ttft_p90_ratio": 5.104982125416625,
+      "gpu_util_ratio": 4.68865671641791,
+      "gpu_util_min": 18.50828729281768,
+      "gpu_util_max": 86.77900552486187
+    },
+    "per_class": {
+      "WARM<5k": {
+        "n": 92,
+        "ttft_ms": {
+          "n": 92,
+          "mean": 892.6154872479738,
+          "p50": 207.1402580186259,
+          "p90": 375.00955499126576,
+          "p99": 11232.500832004007
+        }
+      },
+      "MED5-20k": {
+        "n": 278,
+        "ttft_ms": {
+          "n": 278,
+          "mean": 2908.9762750470118,
+          "p50": 770.736623002449,
+          "p90": 4912.921145994915,
+          "p99": 42022.69450199674
+        }
+      },
+      "HEAVY20-50k": {
+        "n": 248,
+        "ttft_ms": {
+          "n": 248,
+          "mean": 4250.573046338691,
+          "p50": 1623.4680919733364,
+          "p90": 11137.098645005608,
+          "p99": 39817.45037299697
+        }
+      },
+      "HEAVY+>50k": {
+        "n": 189,
+        "ttft_ms": {
+          "n": 189,
+          "mean": 10691.78570601113,
+          "p50": 2671.919913002057,
+          "p90": 36922.92091701529,
+          "p99": 53025.03776800586
+        }
+      }
+    }
+  }
+}
--- a/v2/figs/exp_d_policy_dispatch.png
+++ b/v2/figs/exp_d_policy_dispatch.png