v2 exp(d): 5-policy routing under tracets vs thinktime — ranking flip
Extends exp(c) (dispatch ablation, 1 round-robin policy) to the full 5-policy routing comparison, both modes on the SAME ttp trace (807 reqs, fresh vLLM/arm, dash0 8xH20). Confirms exp(c)'s prediction and finds something stronger: the dispatch mode FLIPS which policy wins. - thinktime helps every policy but helps LPWL most (TTFT p90 -40%, E2E mean -31% vs -3..-16% for the rest): tracets bursts punish prefill-spreading. - Ranking flip: tracets -> LPWL only ties unified_ab on TTFT p90 and is 3rd on E2E mean; thinktime -> LPWL is 1st on both (TTFT p90 -31%, best TPOT/balance, zero knobs) vs the tuned unified+A+B. - => benchmark agentic routing with thinktime; tracets' burst artifact erases LPWL's advantage. Caveat n=1: tracets ranking is run-sensitive (does not reproduce dash1 lpwl_5policy_600s.md), the thinktime advantage is the robust signal (appears in both environments). README + grouped-bar fig (figs/exp_d_policy_dispatch.png) + bench_report summaries in results/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
114
v2/exp_d_policy_dispatch/README.md
Normal file
114
v2/exp_d_policy_dispatch/README.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# exp (d) — 5-policy routing under `tracets` vs `thinktime`
|
||||
|
||||
exp (c) showed the **dispatch mode** changes measured performance for a single
|
||||
round-robin policy, and predicted: *"a cache-aware policy (LPWL) would lower the
|
||||
latencies and likely **widen** the thinktime advantage."* exp (d) tests that with
|
||||
the full routing comparison — and finds something stronger: **the dispatch mode
|
||||
flips which policy wins.**
|
||||
|
||||
**Question.** Does the parameter-free LPWL still beat the tuned `unified+A+B`
|
||||
baseline once we benchmark with the *faithful* `thinktime` load instead of the
|
||||
`tracets` burst artifact?
|
||||
|
||||
## Setup
|
||||
|
||||
5 routing policies, each its own **fresh vLLM (cold APC)** on dash0 8×H20,
|
||||
Qwen3-Coder-30B-A3B, via `scripts/b3_isolated_policy.sh`. **Both dispatch modes
|
||||
run on the *same* trace** `traces/w600_r0.0015_st30_first600s_ttp.jsonl` (807
|
||||
reqs, 274 sessions) — the only variable is `REPLAY_DISPATCH_MODE`
|
||||
(`tracets` ignores the `time_to_parent_chat` field, `thinktime` consumes it).
|
||||
Analyzer: `scripts/bench_report.py` (summaries in `results/`).
|
||||
|
||||
- `leastwork` — **LPWL**, parameter-free (`pending_prefill + max(0, input−cache_hit)`)
|
||||
- `unified_ab` — unified hybrid, tuned A+B′ (`of=1.3, lmw=0.01`)
|
||||
- `unified_def` — unified hybrid, defaults (`of=2.0, lmw=0.0`)
|
||||
- `lmetric` — P_tokens × BS, no affinity
|
||||
- `sticky` — hard session affinity
|
||||
|
||||
## Result (ms; `figs/exp_d_policy_dispatch.png`)
|
||||
|
||||
| policy | mode | TTFT p90 | E2E mean | E2E p90 | E2E p99 | TPOT p90 | APC | req-bal |
|
||||
|---|---|---:|---:|---:|---:|---:|---:|---:|
|
||||
| **LPWL** | tracets | 11099 | 9827 | 25366 | 93929 | 33 | 0.650 | **1.49×** |
|
||||
| **LPWL** | **thinktime** | **6713** | **6788** | **17635** | 69946 | **18** | 0.676 | 1.94× |
|
||||
| unified+A+B | tracets | 10783 | 8531 | 22063 | 75419 | 21 | 0.667 | 1.54× |
|
||||
| unified+A+B | thinktime | 9736 | 7131 | 18690 | **63788** | 19 | 0.676 | 2.16× |
|
||||
| unified default | tracets | 12997 | 8366 | 22819 | 82257 | 20 | 0.693 | 1.56× |
|
||||
| unified default | thinktime | 11268 | 7975 | 24096 | 72334 | 22 | 0.693 | 2.91× |
|
||||
| LMetric | tracets | 16492 | 10775 | 27791 | 99231 | 39 | 0.495 | 2.19× |
|
||||
| LMetric | thinktime | 15607 | 9902 | 27819 | 73672 | 30 | 0.483 | 2.10× |
|
||||
| sticky | tracets | 15236 | 10139 | 27974 | 82362 | 31 | 0.693 | 2.06× |
|
||||
| sticky | thinktime | 14838 | 8663 | 24966 | 70933 | 24 | 0.694 | 2.48× |
|
||||
|
||||
### Finding 1 — `thinktime` helps every policy, but helps **LPWL the most**
|
||||
|
||||
Per-policy `tracets`→`thinktime` change (negative = thinktime better):
|
||||
|
||||
| policy | ΔTTFT p90 | ΔE2E mean | ΔTPOT p90 |
|
||||
|---|---:|---:|---:|
|
||||
| **LPWL** | **−40%** | **−31%** | **−45%** |
|
||||
| unified+A+B | −10% | −16% | −10% |
|
||||
| unified default | −13% | −5% | +10% |
|
||||
| LMetric | −5% | −8% | −23% |
|
||||
| sticky | −3% | −15% | −23% |
|
||||
|
||||
`tracets` collapses the inter-turn think-time to ~0 (exp c), manufacturing bursts
|
||||
→ peak concurrency → KV pressure → preemption. Those bursts punish exactly the
|
||||
policy that spreads prefill thinly across hosts (LPWL keeps the tightest request
|
||||
balance, 1.49×), because under a burst the spread sacrifices locality without the
|
||||
slack to amortize it. Remove the artifact and LPWL's prefill-aware placement pays.
|
||||
|
||||
### Finding 2 — the dispatch mode **flips the cross-policy ranking**
|
||||
|
||||
- **TTFT p90:** `tracets` → `unified_ab (10.8s) ≈ LPWL (11.1s)` — LPWL only *ties*,
|
||||
even slightly behind. `thinktime` → **LPWL (6.7s)** < unified_ab (9.7s): LPWL is
|
||||
first, **−31%** vs the tuned baseline.
|
||||
- **E2E mean:** `tracets` → unified_def (8.4s) < unified_ab (8.5s) < **LPWL (9.8s)**
|
||||
— LPWL is *3rd, behind both unified variants*. `thinktime` → **LPWL (6.8s)** <
|
||||
unified_ab (7.1s) < unified_def (8.0s): LPWL is **first**.
|
||||
|
||||
So under artificial `tracets` bursts the parameter-free policy looks tied-or-worse;
|
||||
under the faithful `thinktime` load it is the clear winner on TTFT and E2E, at
|
||||
zero knobs and best balance.
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Benchmark agentic routing with `thinktime`. Under it, the parameter-free LPWL is
|
||||
the best of the five policies** — TTFT p90 −31%, E2E mean −5% / p90 −6%, best TPOT,
|
||||
tightest balance vs the *tuned* `unified+A+B` — and the `tracets` burst artifact is
|
||||
precisely what erases that advantage (it even drops LPWL to 3rd on E2E). This both
|
||||
confirms exp (c)'s prediction and is independent evidence for the GPU-hit-first
|
||||
routing story: faithful load rewards keeping the active working set GPU-resident.
|
||||
|
||||
## Caveats
|
||||
|
||||
- **n = 1 per arm.** The `tracets` ranking here does **not** reproduce the earlier
|
||||
dash1 `analysis/lpwl_5policy_600s.md` (which saw LPWL win TTFT p90 −31% *in
|
||||
tracets*); on dash0 `tracets` it is a tie. i.e. **`tracets` rankings are
|
||||
run/harness-sensitive** — the robust signal is the `thinktime` advantage, which
|
||||
appears in *both* environments. Repeat ×3 to bound noise.
|
||||
- LPWL's one persistent weak spot is **E2E p99** (thinktime 69.9s vs unified_ab
|
||||
63.8s) — the structural HEAVY+ >50k decode tail, identical across policies, not
|
||||
routing-fixable (see `lpwl_5policy_600s.md` κ-ablation).
|
||||
- `thinktime` advantage is a capacity-slack effect; under saturation the modes
|
||||
converge (exp c, N=6).
|
||||
|
||||
## Repro
|
||||
```bash
|
||||
# 1. annotate the full trace with time_to_parent_chat (dash0; once)
|
||||
python scripts/add_ttp_streaming.py 051315-051317.jsonl 051315-051317-ttp.jsonl \
|
||||
051315-051317-raw.jsonl
|
||||
# 2. resample (same seed reproduces traces/w600_r0.0015_st30.jsonl + the ttp field;
|
||||
# first600s = timestamp<600 filter)
|
||||
python scripts/sample_trace.py --input 051315-051317-ttp.jsonl \
|
||||
--output traces/w600_r0.0015_st30_ttp.jsonl \
|
||||
--window-seconds 600 --sample-ratio 0.0015 --max-single-turn-ratio 0.30 --seed 42
|
||||
# 3. run both modes x 5 policies (~3.5 h, fresh vLLM/arm)
|
||||
TRACE_FILE=traces/w600_r0.0015_st30_first600s_ttp.jsonl \
|
||||
bash microbench/connector_tax/cache_sweep/run_5policy_both_modes.sh
|
||||
# 4. report + plot
|
||||
python scripts/bench_report.py --root outputs/policy5_600s_thinktime_<date> \
|
||||
--json v2/exp_d_policy_dispatch/results/thinktime.json \
|
||||
leastwork unified_ab unified_def lmetric sticky
|
||||
python v2/exp_d_policy_dispatch/plot.py
|
||||
```
|
||||
68
v2/exp_d_policy_dispatch/plot.py
Normal file
68
v2/exp_d_policy_dispatch/plot.py
Normal file
@@ -0,0 +1,68 @@
|
||||
"""exp (d): 5-policy routing under tracets vs thinktime dispatch.
|
||||
|
||||
Shows the ranking FLIP: under the faithful `thinktime` load the parameter-free
|
||||
LPWL (leastwork) is the clear winner, but under `tracets` (think-collapse bursts)
|
||||
its advantage disappears (it ties unified_ab on TTFT p90 and *loses* on E2E mean).
|
||||
|
||||
Reads the two bench_report summaries; writes v2/figs/exp_d_policy_dispatch.png.
|
||||
Usage: python v2/exp_d_policy_dispatch/plot.py
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
|
||||
import matplotlib
|
||||
matplotlib.use("Agg")
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
HERE = os.path.dirname(__file__)
|
||||
TC = json.load(open(os.path.join(HERE, "results/tracets.json")))
|
||||
TT = json.load(open(os.path.join(HERE, "results/thinktime.json")))
|
||||
|
||||
# canonical order: LPWL first; pretty labels
|
||||
ARMS = ["leastwork", "unified_ab", "unified_def", "lmetric", "sticky"]
|
||||
LABEL = {"leastwork": "LPWL\n(leastwork)", "unified_ab": "unified\n+A+B",
|
||||
"unified_def": "unified\ndefault", "lmetric": "LMetric", "sticky": "sticky"}
|
||||
C_TC, C_TT = "#d62728", "#2ca02c" # tracets red / thinktime green (match exp_c)
|
||||
|
||||
|
||||
def panel(ax, key, sub, title, ylab):
|
||||
tc = [TC[a][key][sub] / 1000.0 for a in ARMS] # ms -> s
|
||||
tt = [TT[a][key][sub] / 1000.0 for a in ARMS]
|
||||
x = range(len(ARMS))
|
||||
w = 0.38
|
||||
b1 = ax.bar([i - w / 2 for i in x], tc, w, label="tracets (burst)", color=C_TC)
|
||||
b2 = ax.bar([i + w / 2 for i in x], tt, w, label="thinktime (faithful)", color=C_TT)
|
||||
for bars in (b1, b2):
|
||||
for r in bars:
|
||||
ax.text(r.get_x() + r.get_width() / 2, r.get_height(),
|
||||
f"{r.get_height():.1f}", ha="center", va="bottom", fontsize=8)
|
||||
ax.set_xticks(list(x)); ax.set_xticklabels([LABEL[a] for a in ARMS], fontsize=9)
|
||||
ax.set_ylabel(ylab); ax.set_title(title, fontsize=11)
|
||||
ax.grid(axis="y", alpha=.3)
|
||||
ax.set_ylim(0, max(tc + tt) * 1.18)
|
||||
# mark LPWL-thinktime as the winner (lowest green) in each panel
|
||||
ax.annotate("LPWL wins\nunder thinktime", xy=(0 + w / 2, tt[0]),
|
||||
xytext=(0.9, max(tc + tt) * 0.86), fontsize=8.5, color=C_TT,
|
||||
ha="left", arrowprops=dict(arrowstyle="->", color=C_TT, lw=1.3))
|
||||
return b1, b2
|
||||
|
||||
|
||||
fig, (axL, axR) = plt.subplots(1, 2, figsize=(11.2, 4.6))
|
||||
panel(axL, "ttft_ms", "p90", "TTFT p90 (lower = better)", "TTFT p90 (s)")
|
||||
panel(axR, "e2e_ms", "mean", "E2E mean (lower = better)", "E2E mean (s)")
|
||||
axL.legend(loc="upper left", fontsize=9)
|
||||
fig.suptitle("5-policy routing: dispatch mode flips the ranking — "
|
||||
"LPWL is best under faithful thinktime, only ties/loses under tracets bursts",
|
||||
fontsize=11.5)
|
||||
fig.tight_layout(rect=(0, 0, 1, 0.95))
|
||||
out = os.path.join(HERE, "..", "figs", "exp_d_policy_dispatch.png")
|
||||
fig.savefig(out, dpi=140)
|
||||
print("wrote", os.path.normpath(out))
|
||||
|
||||
# also print the deltas the README cites
|
||||
print("\npolicy TTFTp90 tc->tt E2Emean tc->tt")
|
||||
for a in ARMS:
|
||||
t1, t2 = TC[a]["ttft_ms"]["p90"], TT[a]["ttft_ms"]["p90"]
|
||||
e1, e2 = TC[a]["e2e_ms"]["mean"], TT[a]["e2e_ms"]["mean"]
|
||||
print(f"{a:<13} {t1/1000:5.1f}->{t2/1000:4.1f}s ({(t2-t1)/t1:+.0%}) "
|
||||
f"{e1/1000:5.1f}->{e2/1000:4.1f}s ({(e2-e1)/e1:+.0%})")
|
||||
803
v2/exp_d_policy_dispatch/results/thinktime.json
Normal file
803
v2/exp_d_policy_dispatch/results/thinktime.json
Normal file
@@ -0,0 +1,803 @@
|
||||
{
|
||||
"leastwork": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 986.1941225528717,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 3043.454534307026,
|
||||
"p50": 681.8344180064742,
|
||||
"p90": 6712.89858900127,
|
||||
"p99": 41146.725983999204
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 17.12884673518703,
|
||||
"p50": 7.770131949655479,
|
||||
"p90": 17.997618232737178,
|
||||
"p99": 133.81680370757084
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 6787.973176127951,
|
||||
"p50": 2026.8339599715546,
|
||||
"p90": 17635.302426991984,
|
||||
"p99": 69945.72682998842
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 234.00362537409853,
|
||||
"prefill_tps": 8660.302069020001,
|
||||
"total_tps": 8894.305694394101,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8540739
|
||||
},
|
||||
"apc": 0.6756355919409787,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 96,
|
||||
"decode_tps": 48.631399136561754,
|
||||
"prefill_tps": 812.7547930676582,
|
||||
"ttft_p90_ms": 5368.347445008112,
|
||||
"gpu_util_mean": 48.6875,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 111,
|
||||
"decode_tps": 28.45180209284375,
|
||||
"prefill_tps": 954.9580335787387,
|
||||
"ttft_p90_ms": 3442.4916800053325,
|
||||
"gpu_util_mean": 40.479166666666664,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 99,
|
||||
"decode_tps": 35.558922120953866,
|
||||
"prefill_tps": 901.7494422882478,
|
||||
"ttft_p90_ms": 5583.948273997521,
|
||||
"gpu_util_mean": 48.395833333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 88,
|
||||
"decode_tps": 20.717016592141224,
|
||||
"prefill_tps": 1149.215934349922,
|
||||
"ttft_p90_ms": 6448.1909119931515,
|
||||
"gpu_util_mean": 38.020833333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 124,
|
||||
"decode_tps": 38.884839326290034,
|
||||
"prefill_tps": 891.8842445776638,
|
||||
"ttft_p90_ms": 4944.760143000167,
|
||||
"gpu_util_mean": 40.020833333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 110,
|
||||
"decode_tps": 20.013301183451194,
|
||||
"prefill_tps": 1581.959336729224,
|
||||
"ttft_p90_ms": 27228.53080899222,
|
||||
"gpu_util_mean": 78.19791666666667,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 64,
|
||||
"decode_tps": 25.779914337947165,
|
||||
"prefill_tps": 1114.0737658787832,
|
||||
"ttft_p90_ms": 18414.893322013086,
|
||||
"gpu_util_mean": 49.833333333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 115,
|
||||
"decode_tps": 15.966430583909537,
|
||||
"prefill_tps": 1253.7065185497638,
|
||||
"ttft_p90_ms": 9039.336649002507,
|
||||
"gpu_util_mean": 39.5625,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 1.9375,
|
||||
"ttft_p90_ratio": 7.909541500751002,
|
||||
"gpu_util_ratio": 2.0567123287671234,
|
||||
"gpu_util_min": 38.020833333333336,
|
||||
"gpu_util_max": 78.19791666666667
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 192.46459313074845,
|
||||
"p50": 177.03324498143047,
|
||||
"p90": 313.57523999758996,
|
||||
"p99": 553.8838730135467
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 772.5742901807313,
|
||||
"p50": 677.829442982329,
|
||||
"p90": 1460.6262099987362,
|
||||
"p99": 2101.3274399738293
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 2004.694984432952,
|
||||
"p50": 1127.2326559992507,
|
||||
"p90": 5081.04542500223,
|
||||
"p99": 9901.586207997752
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 9134.502951365745,
|
||||
"p50": 2167.4920289951842,
|
||||
"p90": 28926.44312098855,
|
||||
"p99": 49472.52169801504
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"unified_ab": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 986.5525379180908,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 3592.357064001708,
|
||||
"p50": 676.4678099716548,
|
||||
"p90": 9736.127940996084,
|
||||
"p99": 42370.66501099616
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 13.200466578008895,
|
||||
"p50": 7.819523662692517,
|
||||
"p90": 19.090397550442486,
|
||||
"p99": 133.40408908212945
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 7131.188424004758,
|
||||
"p50": 2037.0979200233705,
|
||||
"p90": 18689.829077018658,
|
||||
"p99": 63787.50272799516
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 233.91861166055818,
|
||||
"prefill_tps": 8640.029468666471,
|
||||
"total_tps": 8873.948080327029,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8523843
|
||||
},
|
||||
"apc": 0.6762772765819173,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 58,
|
||||
"decode_tps": 29.088161954921237,
|
||||
"prefill_tps": 930.9397773565431,
|
||||
"ttft_p90_ms": 13273.868343996583,
|
||||
"gpu_util_mean": 44.989583333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 98,
|
||||
"decode_tps": 24.162930086120934,
|
||||
"prefill_tps": 1018.370498666148,
|
||||
"ttft_p90_ms": 4365.537890000269,
|
||||
"gpu_util_mean": 38.90625,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 110,
|
||||
"decode_tps": 35.40713612040818,
|
||||
"prefill_tps": 965.8167845888297,
|
||||
"ttft_p90_ms": 4610.747697995976,
|
||||
"gpu_util_mean": 52.114583333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 102,
|
||||
"decode_tps": 20.719626390233998,
|
||||
"prefill_tps": 1126.5056419045684,
|
||||
"ttft_p90_ms": 10947.632670984603,
|
||||
"gpu_util_mean": 41.703125,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 99,
|
||||
"decode_tps": 44.64435324746667,
|
||||
"prefill_tps": 911.5449663712324,
|
||||
"ttft_p90_ms": 4116.690531984204,
|
||||
"gpu_util_mean": 42.671875,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 110,
|
||||
"decode_tps": 29.724722072971574,
|
||||
"prefill_tps": 918.851216898154,
|
||||
"ttft_p90_ms": 4543.632891000016,
|
||||
"gpu_util_mean": 40.864583333333336,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 125,
|
||||
"decode_tps": 28.516474205589404,
|
||||
"prefill_tps": 1522.1155917037186,
|
||||
"ttft_p90_ms": 25507.55575299263,
|
||||
"gpu_util_mean": 76.203125,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 105,
|
||||
"decode_tps": 21.655207582846195,
|
||||
"prefill_tps": 1245.884991177276,
|
||||
"ttft_p90_ms": 20629.490054008784,
|
||||
"gpu_util_mean": 47.276041666666664,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {
|
||||
"lmetric_fallback": 389,
|
||||
"affinity": 418
|
||||
},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 2.1551724137931036,
|
||||
"ttft_p90_ratio": 6.196131468910353,
|
||||
"gpu_util_ratio": 1.9586345381526105,
|
||||
"gpu_util_min": 38.90625,
|
||||
"gpu_util_max": 76.203125
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 448.3382160131283,
|
||||
"p50": 179.28761898656376,
|
||||
"p90": 323.1771159917116,
|
||||
"p99": 5748.067840992007
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 1455.8712874500252,
|
||||
"p50": 685.6210659898352,
|
||||
"p90": 1802.9974120145198,
|
||||
"p99": 32571.255193004617
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 2672.607777120579,
|
||||
"p50": 1117.918328003725,
|
||||
"p90": 5214.129884989234,
|
||||
"p99": 22190.210508997552
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 9472.201524545819,
|
||||
"p50": 2150.3282230114564,
|
||||
"p90": 28876.64386598044,
|
||||
"p99": 48314.48572798399
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"unified_def": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 979.5575842857361,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4037.2454534798544,
|
||||
"p50": 695.2703970018774,
|
||||
"p90": 11267.881545994896,
|
||||
"p99": 46221.317757997895
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 16.476541787288614,
|
||||
"p50": 8.307468241425875,
|
||||
"p90": 21.768670571627954,
|
||||
"p99": 200.26358073773736
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 7974.606969135101,
|
||||
"p50": 2098.1516239990015,
|
||||
"p90": 24096.24872301356,
|
||||
"p99": 72334.40188399982
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 235.5890084484137,
|
||||
"prefill_tps": 8253.263646460364,
|
||||
"total_tps": 8488.852654908778,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8084547
|
||||
},
|
||||
"apc": 0.6929610772463206,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 96,
|
||||
"decode_tps": 39.88024862110074,
|
||||
"prefill_tps": 791.1724766697671,
|
||||
"ttft_p90_ms": 5825.010653992649,
|
||||
"gpu_util_mean": 47.68586387434555,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 55,
|
||||
"decode_tps": 17.82028977166094,
|
||||
"prefill_tps": 910.7254277965683,
|
||||
"ttft_p90_ms": 16298.377383005572,
|
||||
"gpu_util_mean": 39.2565445026178,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 98,
|
||||
"decode_tps": 27.174512685142215,
|
||||
"prefill_tps": 1043.4608606959093,
|
||||
"ttft_p90_ms": 9739.183520985534,
|
||||
"gpu_util_mean": 40.83769633507853,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 103,
|
||||
"decode_tps": 24.518211471470025,
|
||||
"prefill_tps": 1003.2661844138513,
|
||||
"ttft_p90_ms": 6705.797864007764,
|
||||
"gpu_util_mean": 33.50785340314136,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 102,
|
||||
"decode_tps": 49.593817432818994,
|
||||
"prefill_tps": 689.9175820202374,
|
||||
"ttft_p90_ms": 2474.3239340023138,
|
||||
"gpu_util_mean": 45.246073298429316,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 112,
|
||||
"decode_tps": 20.50823792523468,
|
||||
"prefill_tps": 1346.1127974027988,
|
||||
"ttft_p90_ms": 23553.059853002196,
|
||||
"gpu_util_mean": 50.109947643979055,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 81,
|
||||
"decode_tps": 19.51697409799575,
|
||||
"prefill_tps": 990.0816609032532,
|
||||
"ttft_p90_ms": 5961.234248999972,
|
||||
"gpu_util_mean": 38.717277486910994,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 160,
|
||||
"decode_tps": 36.57671644299036,
|
||||
"prefill_tps": 1478.5266565579789,
|
||||
"ttft_p90_ms": 17912.180206010817,
|
||||
"gpu_util_mean": 85.15183246073299,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {
|
||||
"lmetric_fallback": 349,
|
||||
"affinity": 458
|
||||
},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 2.909090909090909,
|
||||
"ttft_p90_ratio": 9.518988006919619,
|
||||
"gpu_util_ratio": 2.5412500000000002,
|
||||
"gpu_util_min": 33.50785340314136,
|
||||
"gpu_util_max": 85.15183246073299
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 594.1550390875225,
|
||||
"p50": 196.222682017833,
|
||||
"p90": 338.4021449892316,
|
||||
"p99": 7637.84466200741
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 1386.6929373560054,
|
||||
"p50": 662.5233909871895,
|
||||
"p90": 1772.5210430216976,
|
||||
"p99": 19121.71271801344
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 3761.512416864031,
|
||||
"p50": 1186.4990000030957,
|
||||
"p90": 7436.603061010828,
|
||||
"p99": 37502.096537995385
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 9973.751859232492,
|
||||
"p50": 2084.2301140073687,
|
||||
"p90": 34646.72368601896,
|
||||
"p99": 51783.358982007485
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"lmetric": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 1036.9893975257874,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4942.361280256006,
|
||||
"p50": 1195.667241991032,
|
||||
"p90": 15606.655231997138,
|
||||
"p99": 46217.127193987835
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 19.707597229545165,
|
||||
"p50": 9.35281297406689,
|
||||
"p90": 30.177805961172382,
|
||||
"p99": 232.18400578116416
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 9901.839828112516,
|
||||
"p50": 3177.2723750036675,
|
||||
"p90": 27819.4430010044,
|
||||
"p99": 73672.06387300394
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 222.5413302687709,
|
||||
"prefill_tps": 13134.949144609054,
|
||||
"total_tps": 13357.490474877826,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 13620803
|
||||
},
|
||||
"apc": 0.48270240989877555,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 121,
|
||||
"decode_tps": 40.13348651326501,
|
||||
"prefill_tps": 1973.9218210754154,
|
||||
"ttft_p90_ms": 23894.41591600189,
|
||||
"gpu_util_mean": 90.75247524752476,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 128,
|
||||
"decode_tps": 44.98117349250917,
|
||||
"prefill_tps": 1626.6328315647543,
|
||||
"ttft_p90_ms": 5918.853377981577,
|
||||
"gpu_util_mean": 64.96039603960396,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 109,
|
||||
"decode_tps": 26.800659742819484,
|
||||
"prefill_tps": 1578.2861463241723,
|
||||
"ttft_p90_ms": 13917.768498009536,
|
||||
"gpu_util_mean": 58.306930693069305,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 99,
|
||||
"decode_tps": 19.435107097610242,
|
||||
"prefill_tps": 1683.1715002723502,
|
||||
"ttft_p90_ms": 16737.5574040052,
|
||||
"gpu_util_mean": 59.16831683168317,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 116,
|
||||
"decode_tps": 19.955845305048445,
|
||||
"prefill_tps": 1884.7752972820501,
|
||||
"ttft_p90_ms": 11347.276910004439,
|
||||
"gpu_util_mean": 50.36138613861386,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 61,
|
||||
"decode_tps": 12.497716978516857,
|
||||
"prefill_tps": 1726.7611455549827,
|
||||
"ttft_p90_ms": 31680.082703998778,
|
||||
"gpu_util_mean": 55.93069306930693,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 88,
|
||||
"decode_tps": 38.23472071614312,
|
||||
"prefill_tps": 1208.0914259963265,
|
||||
"ttft_p90_ms": 9533.787049993407,
|
||||
"gpu_util_mean": 51.62871287128713,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 85,
|
||||
"decode_tps": 20.50262042285856,
|
||||
"prefill_tps": 1453.3089765390037,
|
||||
"ttft_p90_ms": 14970.007644995349,
|
||||
"gpu_util_mean": 51.757425742574256,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 2.098360655737705,
|
||||
"ttft_p90_ratio": 5.352402007768976,
|
||||
"gpu_util_ratio": 1.8020249680526887,
|
||||
"gpu_util_min": 50.36138613861386,
|
||||
"gpu_util_max": 90.75247524752476
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 511.51012201982036,
|
||||
"p50": 255.4193850082811,
|
||||
"p90": 471.22472297633067,
|
||||
"p99": 3532.1444049768616
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 1010.5527848093863,
|
||||
"p50": 818.2104199950118,
|
||||
"p90": 1878.1264800054487,
|
||||
"p99": 4416.228823014535
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 3164.034748000338,
|
||||
"p50": 2636.801838991232,
|
||||
"p90": 7400.190736021614,
|
||||
"p99": 9636.447697004769
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 15215.938255342222,
|
||||
"p50": 12060.85875100689,
|
||||
"p90": 36602.47571900254,
|
||||
"p99": 52271.21993701439
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"sticky": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 994.9787130355835,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4455.946148958436,
|
||||
"p50": 713.0627470032778,
|
||||
"p90": 14838.208375993418,
|
||||
"p99": 43174.81458699331
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 19.138733289320065,
|
||||
"p50": 8.24416923684399,
|
||||
"p90": 23.769559945071954,
|
||||
"p99": 184.6952650922511
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 8663.490226920512,
|
||||
"p50": 2352.715140004875,
|
||||
"p90": 24966.471978026675,
|
||||
"p99": 70932.61348700617
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 231.93762537485247,
|
||||
"prefill_tps": 8105.277926394779,
|
||||
"total_tps": 8337.215551769632,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8064579
|
||||
},
|
||||
"apc": 0.6937194318219754,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 156,
|
||||
"decode_tps": 44.672312500428745,
|
||||
"prefill_tps": 1949.5271351907093,
|
||||
"ttft_p90_ms": 20576.009418989997,
|
||||
"gpu_util_mean": 93.18041237113403,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 114,
|
||||
"decode_tps": 44.75372127725863,
|
||||
"prefill_tps": 929.0429914624127,
|
||||
"ttft_p90_ms": 5498.717762995511,
|
||||
"gpu_util_mean": 53.08247422680412,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 88,
|
||||
"decode_tps": 29.785561853462614,
|
||||
"prefill_tps": 904.2113044360427,
|
||||
"ttft_p90_ms": 12234.77461998118,
|
||||
"gpu_util_mean": 49.628865979381445,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 98,
|
||||
"decode_tps": 29.982550992458386,
|
||||
"prefill_tps": 1018.2680159145942,
|
||||
"ttft_p90_ms": 16286.48554199026,
|
||||
"gpu_util_mean": 44.123711340206185,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 110,
|
||||
"decode_tps": 37.69427376549181,
|
||||
"prefill_tps": 949.1017120546454,
|
||||
"ttft_p90_ms": 6709.773182024946,
|
||||
"gpu_util_mean": 45.7680412371134,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 99,
|
||||
"decode_tps": 19.964246209244884,
|
||||
"prefill_tps": 980.7747514746083,
|
||||
"ttft_p90_ms": 14065.780322009232,
|
||||
"gpu_util_mean": 36.324742268041234,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 79,
|
||||
"decode_tps": 11.21531531660107,
|
||||
"prefill_tps": 682.5723918534845,
|
||||
"ttft_p90_ms": 4579.089447972365,
|
||||
"gpu_util_mean": 22.288659793814432,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 63,
|
||||
"decode_tps": 13.869643459906332,
|
||||
"prefill_tps": 691.7796240082818,
|
||||
"ttft_p90_ms": 18229.593775991816,
|
||||
"gpu_util_mean": 30.762886597938145,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 2.4761904761904763,
|
||||
"ttft_p90_ratio": 4.493471825081102,
|
||||
"gpu_util_ratio": 4.180619796484737,
|
||||
"gpu_util_min": 22.288659793814432,
|
||||
"gpu_util_max": 93.18041237113403
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 827.0193562525294,
|
||||
"p50": 197.0047799986787,
|
||||
"p90": 507.2060489910655,
|
||||
"p99": 19187.98109301133
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 2624.659966896439,
|
||||
"p50": 736.4085000008345,
|
||||
"p90": 3899.43698499701,
|
||||
"p99": 33760.123436979484
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 3807.600332329692,
|
||||
"p50": 1086.1541359918192,
|
||||
"p90": 9912.624888995197,
|
||||
"p99": 40516.03257699753
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 9766.785228673292,
|
||||
"p50": 2521.5582190139685,
|
||||
"p90": 34039.37866198248,
|
||||
"p99": 47948.314540000865
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
803
v2/exp_d_policy_dispatch/results/tracets.json
Normal file
803
v2/exp_d_policy_dispatch/results/tracets.json
Normal file
@@ -0,0 +1,803 @@
|
||||
{
|
||||
"leastwork": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 1045.237051486969,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4029.99802739754,
|
||||
"p50": 856.9921070011333,
|
||||
"p90": 11099.306205986068,
|
||||
"p99": 43400.520397000946
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 21.48754069144944,
|
||||
"p50": 8.545071840088175,
|
||||
"p90": 33.14273249998223,
|
||||
"p99": 221.47291811146448
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 9827.03743343642,
|
||||
"p50": 2474.214674992254,
|
||||
"p90": 25365.627319028135,
|
||||
"p99": 93929.44298699149
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 220.78532297692573,
|
||||
"prefill_tps": 8826.999566150593,
|
||||
"total_tps": 9047.784889127519,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 9226307
|
||||
},
|
||||
"apc": 0.6495987515101673,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 95,
|
||||
"decode_tps": 26.681028920976477,
|
||||
"prefill_tps": 996.0917463831219,
|
||||
"ttft_p90_ms": 5435.623251018114,
|
||||
"gpu_util_mean": 40.30392156862745,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 94,
|
||||
"decode_tps": 53.13244485640239,
|
||||
"prefill_tps": 951.7659162433571,
|
||||
"ttft_p90_ms": 10786.369787005242,
|
||||
"gpu_util_mean": 61.27450980392157,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 89,
|
||||
"decode_tps": 19.963892372849116,
|
||||
"prefill_tps": 1100.5666115294052,
|
||||
"ttft_p90_ms": 4944.386984978337,
|
||||
"gpu_util_mean": 36.745098039215684,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 117,
|
||||
"decode_tps": 23.37842881214075,
|
||||
"prefill_tps": 1155.5378737117585,
|
||||
"ttft_p90_ms": 6188.670104980702,
|
||||
"gpu_util_mean": 40.76470588235294,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 108,
|
||||
"decode_tps": 34.993018997907186,
|
||||
"prefill_tps": 947.086594941991,
|
||||
"ttft_p90_ms": 4642.632269999012,
|
||||
"gpu_util_mean": 40.19607843137255,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 87,
|
||||
"decode_tps": 10.190989675352892,
|
||||
"prefill_tps": 1130.268964651918,
|
||||
"ttft_p90_ms": 9265.34449501196,
|
||||
"gpu_util_mean": 29.41176470588235,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 130,
|
||||
"decode_tps": 20.925396749842136,
|
||||
"prefill_tps": 1646.6073390256745,
|
||||
"ttft_p90_ms": 38816.55501498608,
|
||||
"gpu_util_mean": 81.27450980392157,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 87,
|
||||
"decode_tps": 31.520122591454786,
|
||||
"prefill_tps": 899.0745196633663,
|
||||
"ttft_p90_ms": 7304.075189982541,
|
||||
"gpu_util_mean": 42.26960784313726,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 1.4942528735632183,
|
||||
"ttft_p90_ratio": 8.3608937252733,
|
||||
"gpu_util_ratio": 2.7633333333333336,
|
||||
"gpu_util_min": 29.41176470588235,
|
||||
"gpu_util_max": 81.27450980392157
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 2123.6411421857115,
|
||||
"p50": 202.10654201218858,
|
||||
"p90": 478.5369329911191,
|
||||
"p99": 39781.0776779952
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 1228.5320352953054,
|
||||
"p50": 757.9952410014812,
|
||||
"p90": 1679.417210019892,
|
||||
"p99": 15248.28791298205
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 2886.291041083525,
|
||||
"p50": 1301.9832599966321,
|
||||
"p90": 5246.098520990927,
|
||||
"p99": 39812.045788014075
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 10579.37216416889,
|
||||
"p50": 2986.336703004781,
|
||||
"p90": 34044.874527986394,
|
||||
"p99": 51031.28803099389
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"unified_ab": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 1081.3928244113922,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4003.8645795703064,
|
||||
"p50": 745.6592369999271,
|
||||
"p90": 10783.10890001012,
|
||||
"p99": 46727.02033401583
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 18.129335403553664,
|
||||
"p50": 8.004697278213117,
|
||||
"p90": 20.508462421730655,
|
||||
"p99": 188.8185092436804
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 8531.231209817679,
|
||||
"p50": 2227.301309001632,
|
||||
"p90": 22062.78157699853,
|
||||
"p99": 75419.32771002757
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 213.4034874196719,
|
||||
"prefill_tps": 8110.4912128244405,
|
||||
"total_tps": 8323.894700244113,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8770627
|
||||
},
|
||||
"apc": 0.66690479182639,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 119,
|
||||
"decode_tps": 36.19868665330458,
|
||||
"prefill_tps": 1627.2449384503814,
|
||||
"ttft_p90_ms": 28329.616097005783,
|
||||
"gpu_util_mean": 85.30805687203791,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 94,
|
||||
"decode_tps": 30.323855734708484,
|
||||
"prefill_tps": 977.303507238684,
|
||||
"ttft_p90_ms": 5139.202910999302,
|
||||
"gpu_util_mean": 41.165876777251185,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 79,
|
||||
"decode_tps": 13.535321919642842,
|
||||
"prefill_tps": 1030.6661694437062,
|
||||
"ttft_p90_ms": 16363.982771988958,
|
||||
"gpu_util_mean": 35.014218009478675,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 122,
|
||||
"decode_tps": 25.46530675842895,
|
||||
"prefill_tps": 900.9917376974744,
|
||||
"ttft_p90_ms": 5133.929038012866,
|
||||
"gpu_util_mean": 35.85781990521327,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 101,
|
||||
"decode_tps": 39.54437188287811,
|
||||
"prefill_tps": 731.5889121334044,
|
||||
"ttft_p90_ms": 8783.158237987664,
|
||||
"gpu_util_mean": 45.843601895734594,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 109,
|
||||
"decode_tps": 32.10951581715914,
|
||||
"prefill_tps": 842.2332564447569,
|
||||
"ttft_p90_ms": 4199.806818010984,
|
||||
"gpu_util_mean": 37.32701421800948,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 89,
|
||||
"decode_tps": 8.69804181021798,
|
||||
"prefill_tps": 1146.4917946675243,
|
||||
"ttft_p90_ms": 11112.522551004076,
|
||||
"gpu_util_mean": 30.95734597156398,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 94,
|
||||
"decode_tps": 27.528386843331813,
|
||||
"prefill_tps": 853.9708967485094,
|
||||
"ttft_p90_ms": 10584.918729990022,
|
||||
"gpu_util_mean": 47.426540284360186,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {
|
||||
"lmetric_fallback": 394,
|
||||
"affinity": 413
|
||||
},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 1.5443037974683544,
|
||||
"ttft_p90_ratio": 6.745456951856325,
|
||||
"gpu_util_ratio": 2.7556644213104713,
|
||||
"gpu_util_min": 30.95734597156398,
|
||||
"gpu_util_max": 85.30805687203791
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 473.4928183806124,
|
||||
"p50": 193.41183800133877,
|
||||
"p90": 326.81434298865497,
|
||||
"p99": 5278.063865000149
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 1676.7254021373014,
|
||||
"p50": 733.51754702162,
|
||||
"p90": 1942.2162729897536,
|
||||
"p99": 28329.616097005783
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 2520.2562936371373,
|
||||
"p50": 1149.4361000077333,
|
||||
"p90": 5139.202910999302,
|
||||
"p99": 26739.575799991144
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 11092.085469873233,
|
||||
"p50": 2945.923403982306,
|
||||
"p90": 38718.26310700271,
|
||||
"p99": 51830.85186799872
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"unified_def": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 912.5732414722443,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4275.594404255811,
|
||||
"p50": 757.1689730102662,
|
||||
"p90": 12997.265826008515,
|
||||
"p99": 47988.61391301034
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 18.24678916181055,
|
||||
"p50": 8.282395397119393,
|
||||
"p90": 19.536432251223843,
|
||||
"p99": 127.01842809143604
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 8365.99611452138,
|
||||
"p50": 2119.7480160044506,
|
||||
"p90": 22818.839199026115,
|
||||
"p99": 82257.18197401147
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 252.8816203592563,
|
||||
"prefill_tps": 8854.578057717206,
|
||||
"total_tps": 9107.459678076462,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8080451
|
||||
},
|
||||
"apc": 0.6931166371592755,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 86,
|
||||
"decode_tps": 48.0925782233062,
|
||||
"prefill_tps": 878.7174152798008,
|
||||
"ttft_p90_ms": 9217.135582002811,
|
||||
"gpu_util_mean": 49.02808988764045,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 114,
|
||||
"decode_tps": 36.67650822853761,
|
||||
"prefill_tps": 962.1616765613936,
|
||||
"ttft_p90_ms": 5333.489734999603,
|
||||
"gpu_util_mean": 43.674157303370784,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 75,
|
||||
"decode_tps": 21.302399759867672,
|
||||
"prefill_tps": 1132.1228292133994,
|
||||
"ttft_p90_ms": 16419.932407996384,
|
||||
"gpu_util_mean": 44.235955056179776,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 94,
|
||||
"decode_tps": 39.351362025545676,
|
||||
"prefill_tps": 918.7383126064411,
|
||||
"ttft_p90_ms": 14089.503638009774,
|
||||
"gpu_util_mean": 57.30898876404494,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 103,
|
||||
"decode_tps": 23.649608622297535,
|
||||
"prefill_tps": 1200.1765449894706,
|
||||
"ttft_p90_ms": 12823.167912021745,
|
||||
"gpu_util_mean": 43.82022471910113,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 104,
|
||||
"decode_tps": 25.253863419028313,
|
||||
"prefill_tps": 1067.8562067279715,
|
||||
"ttft_p90_ms": 18659.113589994377,
|
||||
"gpu_util_mean": 52.449438202247194,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 114,
|
||||
"decode_tps": 27.64600018218629,
|
||||
"prefill_tps": 1278.7061322523903,
|
||||
"ttft_p90_ms": 7502.1790039900225,
|
||||
"gpu_util_mean": 42.62359550561798,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 117,
|
||||
"decode_tps": 30.90929989848701,
|
||||
"prefill_tps": 1416.0989400863393,
|
||||
"ttft_p90_ms": 15775.390956026968,
|
||||
"gpu_util_mean": 79.19101123595506,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {
|
||||
"lmetric_fallback": 353,
|
||||
"affinity": 454
|
||||
},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 1.56,
|
||||
"ttft_p90_ratio": 3.4984812040696216,
|
||||
"gpu_util_ratio": 1.8579148543561357,
|
||||
"gpu_util_min": 42.62359550561798,
|
||||
"gpu_util_max": 79.19101123595506
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 187.7879224041902,
|
||||
"p50": 171.855135995429,
|
||||
"p90": 318.49737899028696,
|
||||
"p99": 406.7676870035939
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 2014.1915133893583,
|
||||
"p50": 737.6636109838728,
|
||||
"p90": 2057.0135610178113,
|
||||
"p99": 29051.10613000579
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 3298.1720407103326,
|
||||
"p50": 1321.936836000532,
|
||||
"p90": 7205.449934001081,
|
||||
"p99": 36468.73455500463
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 10874.266077009788,
|
||||
"p50": 2567.718550999416,
|
||||
"p90": 35562.78318798286,
|
||||
"p99": 61205.50292698317
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"lmetric": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 1077.4112372398376,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 5152.329462028203,
|
||||
"p50": 1274.1917690145783,
|
||||
"p90": 16492.156354011968,
|
||||
"p99": 46248.138958995696
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 24.0370380963572,
|
||||
"p50": 10.534365684851883,
|
||||
"p90": 38.95731354601127,
|
||||
"p99": 231.34709527787183
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 10774.921962922257,
|
||||
"p50": 3460.944951977581,
|
||||
"p90": 27791.26176200225,
|
||||
"p99": 99231.47636200883
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 214.19212276939402,
|
||||
"prefill_tps": 12337.071064946676,
|
||||
"total_tps": 12551.263187716071,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 13292099
|
||||
},
|
||||
"apc": 0.49518609291339905,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 140,
|
||||
"decode_tps": 40.62422823074453,
|
||||
"prefill_tps": 2047.015033600442,
|
||||
"ttft_p90_ms": 13910.764692991506,
|
||||
"gpu_util_mean": 92.98571428571428,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 89,
|
||||
"decode_tps": 23.129515582036454,
|
||||
"prefill_tps": 1463.9739641483652,
|
||||
"ttft_p90_ms": 17954.478895000648,
|
||||
"gpu_util_mean": 60.319047619047616,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 92,
|
||||
"decode_tps": 18.14256184117442,
|
||||
"prefill_tps": 1663.4864553589516,
|
||||
"ttft_p90_ms": 15653.674011002295,
|
||||
"gpu_util_mean": 55.94761904761905,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 92,
|
||||
"decode_tps": 43.91637878329201,
|
||||
"prefill_tps": 1313.127198881315,
|
||||
"ttft_p90_ms": 13397.495551005704,
|
||||
"gpu_util_mean": 54.385714285714286,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 134,
|
||||
"decode_tps": 21.903428500019192,
|
||||
"prefill_tps": 1412.6889969137978,
|
||||
"ttft_p90_ms": 21995.840935007436,
|
||||
"gpu_util_mean": 51.51428571428571,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 88,
|
||||
"decode_tps": 16.934109622562403,
|
||||
"prefill_tps": 1474.5493132872787,
|
||||
"ttft_p90_ms": 11013.645827013534,
|
||||
"gpu_util_mean": 42.55238095238095,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 108,
|
||||
"decode_tps": 30.606697665860118,
|
||||
"prefill_tps": 1651.413070981298,
|
||||
"ttft_p90_ms": 18613.971301994752,
|
||||
"gpu_util_mean": 52.67619047619048,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 64,
|
||||
"decode_tps": 18.935202543704882,
|
||||
"prefill_tps": 1310.817031775228,
|
||||
"ttft_p90_ms": 15090.364522009622,
|
||||
"gpu_util_mean": 41.904761904761905,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 2.1875,
|
||||
"ttft_p90_ratio": 1.997144386199301,
|
||||
"gpu_util_ratio": 2.2189772727272725,
|
||||
"gpu_util_min": 41.904761904761905,
|
||||
"gpu_util_max": 92.98571428571428
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 1789.4509492392028,
|
||||
"p50": 262.9711049958132,
|
||||
"p90": 1572.630943992408,
|
||||
"p99": 24139.729285001522
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 1697.68317032391,
|
||||
"p50": 920.4712719947565,
|
||||
"p90": 2029.556789988419,
|
||||
"p99": 22497.115491016302
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 3424.9239484553045,
|
||||
"p50": 2699.253859987948,
|
||||
"p90": 6799.459913017927,
|
||||
"p99": 24401.87052200781
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 14137.37210560736,
|
||||
"p50": 11013.645827013534,
|
||||
"p90": 35319.35577199329,
|
||||
"p99": 51099.66781400726
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"sticky": {
|
||||
"n_total": 807,
|
||||
"n_ok": 807,
|
||||
"window_s": 925.9680500030518,
|
||||
"ttft_ms": {
|
||||
"n": 807,
|
||||
"mean": 4914.133386887214,
|
||||
"p50": 906.6202020039782,
|
||||
"p90": 15236.451414995827,
|
||||
"p99": 46771.370519010816
|
||||
},
|
||||
"tpot_ms": {
|
||||
"n": 806,
|
||||
"mean": 22.62391467634138,
|
||||
"p50": 9.728358783056292,
|
||||
"p90": 30.957536839455965,
|
||||
"p99": 231.30005976865783
|
||||
},
|
||||
"e2e_ms": {
|
||||
"n": 807,
|
||||
"mean": 10139.474540275223,
|
||||
"p50": 2597.957960999338,
|
||||
"p90": 27973.595037998166,
|
||||
"p99": 82362.0547579776
|
||||
},
|
||||
"throughput": {
|
||||
"decode_tps": 249.22350182518656,
|
||||
"prefill_tps": 8743.631057219865,
|
||||
"total_tps": 8992.854559045052,
|
||||
"total_output_tokens": 230773,
|
||||
"total_new_prefill_tokens": 8096323
|
||||
},
|
||||
"apc": 0.6925138424965755,
|
||||
"per_worker": {
|
||||
"0": {
|
||||
"n": 136,
|
||||
"decode_tps": 55.41875877880546,
|
||||
"prefill_tps": 1434.2957081463262,
|
||||
"ttft_p90_ms": 11807.82909199479,
|
||||
"gpu_util_mean": 65.71270718232044,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"1": {
|
||||
"n": 129,
|
||||
"decode_tps": 38.47432964872015,
|
||||
"prefill_tps": 1709.658340797809,
|
||||
"ttft_p90_ms": 23309.77585897199,
|
||||
"gpu_util_mean": 86.77900552486187,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"2": {
|
||||
"n": 108,
|
||||
"decode_tps": 47.74786775834683,
|
||||
"prefill_tps": 1144.2727424520835,
|
||||
"ttft_p90_ms": 30574.395572999492,
|
||||
"gpu_util_mean": 63.50828729281768,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"3": {
|
||||
"n": 78,
|
||||
"decode_tps": 25.291369394357414,
|
||||
"prefill_tps": 1245.5040970325047,
|
||||
"ttft_p90_ms": 26070.17321899184,
|
||||
"gpu_util_mean": 51.87845303867403,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"4": {
|
||||
"n": 132,
|
||||
"decode_tps": 40.39124244068328,
|
||||
"prefill_tps": 941.3586138281199,
|
||||
"ttft_p90_ms": 5989.128820016049,
|
||||
"gpu_util_mean": 43.0828729281768,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"5": {
|
||||
"n": 84,
|
||||
"decode_tps": 22.365782491017637,
|
||||
"prefill_tps": 1030.0301398054248,
|
||||
"ttft_p90_ms": 14970.723142003408,
|
||||
"gpu_util_mean": 41.43646408839779,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"6": {
|
||||
"n": 66,
|
||||
"decode_tps": 9.593203566765315,
|
||||
"prefill_tps": 669.7250515262986,
|
||||
"ttft_p90_ms": 13777.997018012684,
|
||||
"gpu_util_mean": 20.994475138121548,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
},
|
||||
"7": {
|
||||
"n": 74,
|
||||
"decode_tps": 9.940947746490457,
|
||||
"prefill_tps": 568.7863636312983,
|
||||
"ttft_p90_ms": 7027.451686997665,
|
||||
"gpu_util_mean": 18.50828729281768,
|
||||
"gpu_util_max": 100.0,
|
||||
"gpu_mem_max_mb": 89575.0
|
||||
}
|
||||
},
|
||||
"decisions": {},
|
||||
"gpu_captured": true,
|
||||
"spread": {
|
||||
"n_ratio": 2.0606060606060606,
|
||||
"ttft_p90_ratio": 5.104982125416625,
|
||||
"gpu_util_ratio": 4.68865671641791,
|
||||
"gpu_util_min": 18.50828729281768,
|
||||
"gpu_util_max": 86.77900552486187
|
||||
},
|
||||
"per_class": {
|
||||
"WARM<5k": {
|
||||
"n": 92,
|
||||
"ttft_ms": {
|
||||
"n": 92,
|
||||
"mean": 892.6154872479738,
|
||||
"p50": 207.1402580186259,
|
||||
"p90": 375.00955499126576,
|
||||
"p99": 11232.500832004007
|
||||
}
|
||||
},
|
||||
"MED5-20k": {
|
||||
"n": 278,
|
||||
"ttft_ms": {
|
||||
"n": 278,
|
||||
"mean": 2908.9762750470118,
|
||||
"p50": 770.736623002449,
|
||||
"p90": 4912.921145994915,
|
||||
"p99": 42022.69450199674
|
||||
}
|
||||
},
|
||||
"HEAVY20-50k": {
|
||||
"n": 248,
|
||||
"ttft_ms": {
|
||||
"n": 248,
|
||||
"mean": 4250.573046338691,
|
||||
"p50": 1623.4680919733364,
|
||||
"p90": 11137.098645005608,
|
||||
"p99": 39817.45037299697
|
||||
}
|
||||
},
|
||||
"HEAVY+>50k": {
|
||||
"n": 189,
|
||||
"ttft_ms": {
|
||||
"n": 189,
|
||||
"mean": 10691.78570601113,
|
||||
"p50": 2671.919913002057,
|
||||
"p90": 36922.92091701529,
|
||||
"p99": 53025.03776800586
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
BIN
v2/figs/exp_d_policy_dispatch.png
Normal file
BIN
v2/figs/exp_d_policy_dispatch.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 81 KiB |
Reference in New Issue
Block a user