v2: LMetric PD-colo vs PD-disagg on the real agentic trace

Anchor experiment for the clean-stack PD comparison using the canonical cache-aware proxy with --policy lmetric (scripts/bench.sh harness). Two traces x four arms = eight runs on dash1. Headline: with the right routing baseline (LMetric), PD-colo holds 100% completion on both traces while every static PD-disagg ratio fails (14-65% completion), and the failure mode rotates with the split -- no static partition has a working operating point on this workload. LMetric improves colo dramatically (TTFT p50 1.0s vs original §3 RR 7.0s; 7x) but does NOT rescue PD-disagg, confirming the bottleneck is structural (D-pool admission + multi-turn KV accumulation), not routing. Completion matrix: first600s full colo 100% 100% pd6 (6:2) 58.7% 65.3% (decode-bound) pd4 (4:4) 43.1% 43.9% (both bottlenecks) pd2 (2:6) 22.3% 13.9% (prefill-bound) The original §3 RR "100% PD completion" appears to be a measurement artifact of e13391e: producer-KV eviction acted as a relief valve, letting more requests squeeze under the 600s timeout at the (uncosted) price of cross-turn re-prefill. With the eviction off, PD-disagg is worse than §3 advertised, not better. Artifacts: analysis/v2/fig4l_lmetric.json -- 8-arm summary data analysis/v2/PD_DISAGG_LMETRIC.md -- writeup + reproduce recipe figs/v2/fig4_lmetric_pd_vs_colo.png -- 4-panel comparison figure microbench/fresh_setup/plot_fig4l_lmetric.py -- plot script
2026-05-31 20:15:10 +08:00
parent fafc44da79
commit 7529284cee
4 changed files with 222 additions and 0 deletions
--- a/analysis/v2/PD_DISAGG_LMETRIC.md
+++ b/analysis/v2/PD_DISAGG_LMETRIC.md
@@ -0,0 +1,108 @@
 # PD-colo vs PD-disagg on the real agentic trace — LMetric (cache-aware) clean-stack anchor
 **Figure:** [`figs/v2/fig4_lmetric_pd_vs_colo.png`](../../figs/v2/fig4_lmetric_pd_vs_colo.png)
 **Data:**   [`analysis/v2/fig4l_lmetric.json`](fig4l_lmetric.json)
 **Date:**   2026-05-31 · Hardware: dash1, 8×H20 · Model: Qwen3-Coder-30B-A3B-Instruct
 · vLLM 0.18.1 (V1, chunked-prefill on, `e13391e` eviction gated **off**)
 · Mooncake 0.3.11 · Routing: cache-aware proxy with **`--policy lmetric`**
 · Replayer per-request timeout 600 s.
 ## TL;DR
 On the production agentic trace with the *right* routing baseline (LMetric, cache-aware),
 **PD-colo (8× kv_both) keeps 100 % completion on both traces** and matches the daily-bench
 expectation (~17 min for the high-load first600s, ~50 min for the full trace, with E2E p50
 ~3 s and TTFT p50 ~1 s — **3.5–7× better than the original §3 round-robin baseline at the
 same wall-clock**). Every static **PD-disagg ratio fails** (14–65 % completion), and the
 failure mode rotates predictably with the split — **no static partition has a working
 operating point on this workload**. LMetric improves colo dramatically; it does *not*
 rescue PD-disagg, confirming the bottleneck is structural (D-pool admission capacity +
 multi-turn KV accumulation), not routing.
 ## Setup
 - Trace: `w600_r0.0015_st30.jsonl` (1214 reqs, 274 sessions, agentic multi-turn,
  contexts up to ~112 k tokens; "first600s" variant = same heavy sessions compressed
  into 600 s → 807 reqs at 3.2× higher arrival rate).
 - 8 instances on 8 GPUs.
 - `--mode baseline` for colo (plain vLLM); `--mode pdsep --pd-ratio P:D` for the three PD
  splits, all with Mooncake KV transfer.
 - Cache-aware proxy with LMetric scoring (`P_tokens × num_requests`) + session affinity
  for multi-turn (the colleague's canonical baseline).
 ## Results
 ### first600s (1.35 req/s, high-load stress)
 | arm | success | E2E mean / p50 / p90 / p99 | TTFT p90 | TPOT p99 | TPS | wall |
 |---|---|---|---|---|---|---|
 | **colo (8C)** | **807/807 = 100 %** | 11.1 / 3.27 / 28.6 / 95.9 s | 14.5 s | 388 ms | 226 | 17.0 min |
 | pd6 (6:2) | 474/807 = **58.7 %** | 83.2 / 6.75 / 382 / 542 s | 380 s | 19 ms | 40 | 55 min |
 | pd4 (4:4) | 348/807 = **43.1 %** | 203 / 215 / 477 / 575 s | 475 s | 25 ms | 15 | 114 min |
 | pd2 (2:6) | 180/807 = **22.3 %** | 380 / 536 / 579 / 602 s | 577 s | 18 ms | 34 | 321 min* |
 ### Full trace (0.42 req/s, original §3 anchor load)
 | arm | success | E2E mean / p50 / p90 / p99 | TTFT p90 | TPOT p99 | TPS | wall |
 |---|---|---|---|---|---|---|
 | **colo (8C)** | **1214/1214 = 100 %** | 10.9 / 3.13 / 29.6 / 93.7 s | 16.9 s | 254 ms | 125 | 49.9 min |
 | pd6 (6:2) | 793/1214 = **65.3 %** | 61.9 / 3.70 / 307 / 477 s | 300 s | 18 ms | 46 | 94 min |
 | pd4 (4:4) | 533/1214 = **43.9 %** | 131 / 8.22 / 468 / 531 s | 467 s | 21 ms | 13 | 231 min |
 | pd2 (2:6) | 169/1214 = **13.9 %** | 195 / 6.82 / 552 / 593 s | 549 s | 13 ms | 1 | 563 min |
 \* The pd2 wall-clock is dominated by per-request timeouts (`request_timeout=600 s`)
 draining concurrently behind the multi-turn session causality.
 ## Five clean findings
 1. **LMetric+colo is the right baseline.** Full-trace colo wall **49.9 min ≈ the original
   §3 RR's 49.9 min**, but E2E p50 **3.13 s vs §3's 10.8 s (3.5×)** and TTFT p50
   **1.02 s vs §3's 7.0 s (7×)**. Same throughput envelope, far better latency — by virtue
   of cache-aware routing concentrating each session's turns onto one instance for
   prefix-cache reuse. The original §3 RR was an *unfairly weak* colo baseline.
 2. **Every static PD-disagg ratio fails on the agentic workload.** Completion drops to
   14–65 %, on *both* traces. The drop is not a high-load artifact (it holds at the
   original §3 arrival rate of 0.42 req/s); it is structural.
 3. **Failure mode rotates predictably with the P:D split:**
   - **pd2 (2 producers)** → prefill-bound → 78–86 % TTFT timeouts.
   - **pd6 (2 decode)** → decode-admission-bound → 35–41 % TTFT timeouts.
   - **pd4 (4P+4D)** → both bottlenecks hit → 57 % TTFT timeouts.
   - **No static ratio works.** Colo's elastic 8-GPU pool absorbs whichever phase is
     hot at the moment.
 4. **Decode isolation works, but doesn't matter under failure.** TPOT p99 on every PD
   arm is **13–25 ms** — an order of magnitude better than colo's 254–388 ms — but the
   win applies only to the 14–65 % of requests that get admitted. The other 35–86 %
   time out before ever seeing a first token, so the TPOT win is invisible to the end user.
 5. **The §3 RR "100 % PD completion" was a measurement artifact.** Original §3
   (contaminated stack, RR routing) reported 100 % completion for pd6/pd4. LMetric on
   the clean stack shows 44–65 %. Most plausible mechanism: `e13391e`'s eviction of
   producer KV on every transfer acted as a **relief valve**, reducing producer-pool
   pressure and letting more requests squeeze under the 600 s timeout — at the (uncosted)
   price of cross-turn re-prefill. With the eviction off, producers retain prefix
   correctly → cache works on PD too → but the cache itself contends for producer
   pool capacity, and the decode-pool admission ceiling tips earlier. **PD-disagg is
   worse on agentic than §3 advertised, not better.**
 ## Reproduce
 ```bash
 # On dash1, from the main repo /home/admin/cpfs/wjh/agentic-kv:
 for TR in w600_r0.0015_st30.jsonl w600_r0.0015_st30_first600s.jsonl; do
  TRACE=traces/$TR bash scripts/bench.sh --tag fig4l_lmetric_colo_${TR%.*} \
    --mode baseline --policy lmetric
  for r in 6:2 4:4 2:6; do
    TRACE=traces/$TR bash scripts/bench.sh --tag fig4l_lmetric_${r/:/p}_${TR%.*} \
      --mode pdsep --pd-ratio $r --policy lmetric
  done
 done
 python microbench/fresh_setup/plot_fig4l_lmetric.py
 ```
 Source `bench.sh` cleans GPUs before each arm and writes `metrics.jsonl` +
 `metrics.summary.json` per tag. Aggregation script: see the inline JSON dump used
 to build `analysis/v2/fig4l_lmetric.json`.
--- a/analysis/v2/fig4l_lmetric.json
+++ b/analysis/v2/fig4l_lmetric.json
@@ -0,0 +1 @@
 [{"tag": "fig4l_lmetric_colo_first600s", "arm": "colo", "trace": "first600s", "n": 807, "req": 807, "e2e": {"count": 807.0, "mean": 11.066699584425269, "p50": 3.27055042097345, "p90": 28.745733462180937, "p99": 97.40008939541167}, "ttft": {"count": 807.0, "mean": 5.119651803458883, "p50": 1.2114678020589054, "p90": 14.777630288852365, "p99": 50.68302261995841}, "tpot": {"count": 807.0, "mean": 0.03004899278845205, "p50": 0.009643197803618922, "p90": 0.042092699501536976, "p99": 0.3919741264067197}, "wall": 1020.5351374909515, "tps": 226.12940164644368}, {"tag": "fig4l_lmetric_colo_full", "arm": "colo", "trace": "full", "n": 1214, "req": 1214, "e2e": {"count": 1214.0, "mean": 10.928977524270508, "p50": 3.1279119075043127, "p90": 30.011970606888667, "p99": 94.77313101590481}, "ttft": {"count": 1214.0, "mean": 5.533819193267678, "p50": 1.017395684029907, "p90": 17.36427243486981, "p99": 51.49416554694993}, "tpot": {"count": 1214.0, "mean": 0.02049970290344434, "p50": 0.009544484575988867, "p90": 0.032480608771520716, "p99": 0.26057810739537074}, "wall": 2993.276069591986, "tps": 125.38402448497122}, {"tag": "fig4l_lmetric_pd2_first600s", "arm": "2P+6D", "trace": "first600s", "n": 180, "req": 807, "e2e": {"count": 180.0, "mean": 380.2505690135715, "p50": 535.6594606440049, "p90": 579.5011055286858, "p99": 601.5567972306756}, "ttft": {"count": 180.0, "mean": 378.7133691522933, "p50": 534.4269686369807, "p90": 577.3534130641376, "p99": 596.404559875431}, "tpot": {"count": 180.0, "mean": 0.007975266077679418, "p50": 0.007166497974743372, "p90": 0.012511071875514153, "p99": 0.017508981961061446}, "wall": 19275.367093455978, "tps": 1.8895100582735462}, {"tag": "fig4l_lmetric_pd2_full", "arm": "2P+6D", "trace": "full", "n": 169, "req": 1214, "e2e": {"count": 169.0, "mean": 194.88523891245458, "p50": 6.817620265996084, "p90": 552.1569225640735, "p99": 595.3934216396092}, "ttft": {"count": 169.0, "mean": 193.4153314989016, "p50": 5.60239192598965, "p90": 549.3611521873856, "p99": 582.4436428000824}, "tpot": {"count": 169.0, "mean": 0.007747395842651413, "p50": 0.007691574401794991, "p90": 0.011201243427351017, "p99": 0.013311375577245894}, "wall": 33770.57413210906, "tps": 0.9869539045920406}, {"tag": "fig4l_lmetric_pd4_first600s", "arm": "4P+4D", "trace": "first600s", "n": 348, "req": 807, "e2e": {"count": 348.0, "mean": 202.63302869595395, "p50": 214.03008900902933, "p90": 477.40967412578175, "p99": 576.6393926549597}, "ttft": {"count": 348.0, "mean": 199.96385804087797, "p50": 213.50966987549327, "p90": 475.7766476540827, "p99": 559.6153268160638}, "tpot": {"count": 348.0, "mean": 0.008873619369764751, "p50": 0.007645836479973812, "p90": 0.013845969236959285, "p99": 0.02567216653158788}, "wall": 6850.181333696004, "tps": 15.00296050477674}, {"tag": "fig4l_lmetric_pd4_full", "arm": "4P+4D", "trace": "full", "n": 533, "req": 1214, "e2e": {"count": 533.0, "mean": 130.94711188977982, "p50": 8.219856544979848, "p90": 473.44134307731883, "p99": 533.2597587251009}, "ttft": {"count": 533.0, "mean": 127.83193208824007, "p50": 4.8246813879814, "p90": 467.54664219671395, "p99": 528.8304683346115}, "tpot": {"count": 533.0, "mean": 0.008886429490232585, "p50": 0.007981476340708988, "p90": 0.013570741891233497, "p99": 0.023050950961825044}, "wall": 13884.384965199977, "tps": 12.621372890425038}, {"tag": "fig4l_lmetric_pd6_first600s", "arm": "6P+2D", "trace": "first600s", "n": 474, "req": 807, "e2e": {"count": 474.0, "mean": 83.15809065495806, "p50": 6.7270191764691845, "p90": 391.6558471220078, "p99": 544.7372293809171}, "ttft": {"count": 474.0, "mean": 80.70155321074382, "p50": 4.1273433425230905, "p90": 390.00296151017517, "p99": 539.0574236416071}, "tpot": {"count": 474.0, "mean": 0.008519881756330928, "p50": 0.00803907146806204, "p90": 0.012583933303093976, "p99": 0.018606097790947705}, "wall": 3325.2749515309697, "tps": 39.705588838364164}, {"tag": "fig4l_lmetric_pd6_full", "arm": "6P+2D", "trace": "full", "n": 793, "req": 1214, "e2e": {"count": 793.0, "mean": 61.907526705667, "p50": 3.69814173609484, "p90": 308.2633092067672, "p99": 477.48038318102715}, "ttft": {"count": 793.0, "mean": 59.25069201986225, "p50": 1.402295546955429, "p90": 302.5604081378088, "p99": 475.3738951798529}, "tpot": {"count": 793.0, "mean": 0.009137289999448822, "p50": 0.008635683270933276, "p90": 0.013065757584108427, "p99": 0.01816783740464599}, "wall": 5662.029295974993, "tps": 39.24494000021532}]
--- a/figs/v2/fig4_lmetric_pd_vs_colo.png
+++ b/figs/v2/fig4_lmetric_pd_vs_colo.png
--- a/microbench/fresh_setup/plot_fig4l_lmetric.py
+++ b/microbench/fresh_setup/plot_fig4l_lmetric.py
@@ -0,0 +1,113 @@
 """Render the LMetric PD-colo vs PD-disagg figure on the real agentic trace.
 Input  : analysis/v2/fig4l_lmetric.json     (8 arms = 4 ratios x 2 traces)
 Output : figs/v2/fig4_lmetric_pd_vs_colo.png
 Four panels x four ratios x two traces:
  (a) completion rate %
  (b) E2E latency (mean / p50 / p90)
  (c) throughput (output tokens / second)
  (d) bench wall-clock seconds
 The thesis the figure visualizes: with LMetric routing,
  - colo (elastic 8-GPU pool) holds 100% completion on both traces
  - every PD-disagg ratio fails (completion 14-65%), and the failure mode
    rotates with the split (pd2 = prefill-bound, pd6 = decode-bound)
  - routing policy does not rescue PD-disagg; the bottleneck is structural.
 """
 from __future__ import annotations
 import json
 from pathlib import Path
 import matplotlib
 matplotlib.use("Agg")
 import matplotlib.pyplot as plt
 import numpy as np
 ROOT = Path(__file__).resolve().parents[2]
 DATA = ROOT / "analysis" / "v2" / "fig4l_lmetric.json"
 OUT = ROOT / "figs" / "v2" / "fig4_lmetric_pd_vs_colo.png"
 OUT.parent.mkdir(parents=True, exist_ok=True)
 ARMS = ["colo", "6P+2D", "4P+4D", "2P+6D"]   # decode-rich -> prefill-rich
 TRACES = ["first600s", "full"]
 TRACE_LABEL = {"first600s": "first600s (1.35 req/s, high load)",
               "full": "full w600 (0.42 req/s, original §3)"}
 COLOR = {"first600s": "#1f77b4", "full": "#ff7f0e"}
 def pick(rows, trace, arm):
    for r in rows:
        if r["trace"] == trace and r["arm"] == arm:
            return r
    return None
 def main():
    rows = json.load(open(DATA))
    fig, axes = plt.subplots(2, 2, figsize=(12, 8))
    width = 0.38
    x = np.arange(len(ARMS))
    # (a) completion %
    ax = axes[0, 0]
    for i, tr in enumerate(TRACES):
        vals = [pick(rows, tr, a)["n"] / pick(rows, tr, a)["req"] * 100 for a in ARMS]
        bars = ax.bar(x + (i - 0.5) * width, vals, width, color=COLOR[tr], label=TRACE_LABEL[tr])
        for bx, bv in zip(x + (i - 0.5) * width, vals):
            ax.annotate(f"{bv:.0f}%", (bx, bv + 1.5), ha="center", fontsize=8)
    ax.axhline(100, color="grey", ls=":", lw=1)
    ax.set_xticks(x); ax.set_xticklabels(ARMS)
    ax.set_ylabel("completion (%)"); ax.set_ylim(0, 115)
    ax.set_title("(a) request completion — colo holds 100%, all PD ratios fail")
    ax.legend(fontsize=8); ax.grid(alpha=.3, axis="y")
    # (b) E2E percentiles
    ax = axes[0, 1]
    for i, tr in enumerate(TRACES):
        p50 = [pick(rows, tr, a)["e2e"]["p50"] for a in ARMS]
        p90 = [pick(rows, tr, a)["e2e"]["p90"] for a in ARMS]
        off = (i - 0.5) * width
        ax.bar(x + off, p90, width, color=COLOR[tr], alpha=0.55, label=f"{tr} p90")
        ax.bar(x + off, p50, width, color=COLOR[tr], alpha=1.0, label=f"{tr} p50")
    ax.axhline(600, color="red", ls=":", lw=1, label="600 s request timeout")
    ax.set_xticks(x); ax.set_xticklabels(ARMS)
    ax.set_ylabel("E2E latency (s, log)"); ax.set_yscale("log")
    ax.set_title("(b) E2E p50 (solid) + p90 (faded) — PD pegs at the timeout")
    ax.legend(fontsize=7, ncol=2); ax.grid(alpha=.3, which="both", axis="y")
    # (c) TPS
    ax = axes[1, 0]
    for i, tr in enumerate(TRACES):
        vals = [pick(rows, tr, a)["tps"] for a in ARMS]
        ax.bar(x + (i - 0.5) * width, vals, width, color=COLOR[tr], label=TRACE_LABEL[tr])
        for bx, bv in zip(x + (i - 0.5) * width, vals):
            ax.annotate(f"{bv:.0f}", (bx, bv + 4), ha="center", fontsize=8)
    ax.set_xticks(x); ax.set_xticklabels(ARMS)
    ax.set_ylabel("throughput (output tokens/s)")
    ax.set_title("(c) throughput — PD throughput crashes 5–100×")
    ax.legend(fontsize=8); ax.grid(alpha=.3, axis="y")
    # (d) wall (min)
    ax = axes[1, 1]
    for i, tr in enumerate(TRACES):
        vals = [pick(rows, tr, a)["wall"] / 60 for a in ARMS]
        ax.bar(x + (i - 0.5) * width, vals, width, color=COLOR[tr], label=TRACE_LABEL[tr])
        for bx, bv in zip(x + (i - 0.5) * width, vals):
            ax.annotate(f"{bv:.0f}m", (bx, bv * 1.05), ha="center", fontsize=8)
    ax.set_xticks(x); ax.set_xticklabels(ARMS)
    ax.set_ylabel("bench wall-clock (min, log)"); ax.set_yscale("log")
    ax.set_title("(d) wall-clock — PD drain dilates the run")
    ax.legend(fontsize=8); ax.grid(alpha=.3, which="both", axis="y")
    fig.suptitle("Fig 4 (LMetric) — PD-colo vs PD-disagg on the real agentic trace "
                 "(`w600_r0.0015_st30`), clean stack, cache-aware LMetric routing",
                 fontsize=12, y=1.0)
    fig.tight_layout()
    fig.savefig(OUT, dpi=130, bbox_inches="tight")
    print(f"wrote {OUT}")
 if __name__ == "__main__":
    main()
		`@@ -0,0 +1 @@`
							[{"tag": "fig4l_lmetric_colo_first600s", "arm": "colo", "trace": "first600s", "n": 807, "req": 807, "e2e": {"count": 807.0, "mean": 11.066699584425269, "p50": 3.27055042097345, "p90": 28.745733462180937, "p99": 97.40008939541167}, "ttft": {"count": 807.0, "mean": 5.119651803458883, "p50": 1.2114678020589054, "p90": 14.777630288852365, "p99": 50.68302261995841}, "tpot": {"count": 807.0, "mean": 0.03004899278845205, "p50": 0.009643197803618922, "p90": 0.042092699501536976, "p99": 0.3919741264067197}, "wall": 1020.5351374909515, "tps": 226.12940164644368}, {"tag": "fig4l_lmetric_colo_full", "arm": "colo", "trace": "full", "n": 1214, "req": 1214, "e2e": {"count": 1214.0, "mean": 10.928977524270508, "p50": 3.1279119075043127, "p90": 30.011970606888667, "p99": 94.77313101590481}, "ttft": {"count": 1214.0, "mean": 5.533819193267678, "p50": 1.017395684029907, "p90": 17.36427243486981, "p99": 51.49416554694993}, "tpot": {"count": 1214.0, "mean": 0.02049970290344434, "p50": 0.009544484575988867, "p90": 0.032480608771520716, "p99": 0.26057810739537074}, "wall": 2993.276069591986, "tps": 125.38402448497122}, {"tag": "fig4l_lmetric_pd2_first600s", "arm": "2P+6D", "trace": "first600s", "n": 180, "req": 807, "e2e": {"count": 180.0, "mean": 380.2505690135715, "p50": 535.6594606440049, "p90": 579.5011055286858, "p99": 601.5567972306756}, "ttft": {"count": 180.0, "mean": 378.7133691522933, "p50": 534.4269686369807, "p90": 577.3534130641376, "p99": 596.404559875431}, "tpot": {"count": 180.0, "mean": 0.007975266077679418, "p50": 0.007166497974743372, "p90": 0.012511071875514153, "p99": 0.017508981961061446}, "wall": 19275.367093455978, "tps": 1.8895100582735462}, {"tag": "fig4l_lmetric_pd2_full", "arm": "2P+6D", "trace": "full", "n": 169, "req": 1214, "e2e": {"count": 169.0, "mean": 194.88523891245458, "p50": 6.817620265996084, "p90": 552.1569225640735, "p99": 595.3934216396092}, "ttft": {"count": 169.0, "mean": 193.4153314989016, "p50": 5.60239192598965, "p90": 549.3611521873856, "p99": 582.4436428000824}, "tpot": {"count": 169.0, "mean": 0.007747395842651413, "p50": 0.007691574401794991, "p90": 0.011201243427351017, "p99": 0.013311375577245894}, "wall": 33770.57413210906, "tps": 0.9869539045920406}, {"tag": "fig4l_lmetric_pd4_first600s", "arm": "4P+4D", "trace": "first600s", "n": 348, "req": 807, "e2e": {"count": 348.0, "mean": 202.63302869595395, "p50": 214.03008900902933, "p90": 477.40967412578175, "p99": 576.6393926549597}, "ttft": {"count": 348.0, "mean": 199.96385804087797, "p50": 213.50966987549327, "p90": 475.7766476540827, "p99": 559.6153268160638}, "tpot": {"count": 348.0, "mean": 0.008873619369764751, "p50": 0.007645836479973812, "p90": 0.013845969236959285, "p99": 0.02567216653158788}, "wall": 6850.181333696004, "tps": 15.00296050477674}, {"tag": "fig4l_lmetric_pd4_full", "arm": "4P+4D", "trace": "full", "n": 533, "req": 1214, "e2e": {"count": 533.0, "mean": 130.94711188977982, "p50": 8.219856544979848, "p90": 473.44134307731883, "p99": 533.2597587251009}, "ttft": {"count": 533.0, "mean": 127.83193208824007, "p50": 4.8246813879814, "p90": 467.54664219671395, "p99": 528.8304683346115}, "tpot": {"count": 533.0, "mean": 0.008886429490232585, "p50": 0.007981476340708988, "p90": 0.013570741891233497, "p99": 0.023050950961825044}, "wall": 13884.384965199977, "tps": 12.621372890425038}, {"tag": "fig4l_lmetric_pd6_first600s", "arm": "6P+2D", "trace": "first600s", "n": 474, "req": 807, "e2e": {"count": 474.0, "mean": 83.15809065495806, "p50": 6.7270191764691845, "p90": 391.6558471220078, "p99": 544.7372293809171}, "ttft": {"count": 474.0, "mean": 80.70155321074382, "p50": 4.1273433425230905, "p90": 390.00296151017517, "p99": 539.0574236416071}, "tpot": {"count": 474.0, "mean": 0.008519881756330928, "p50": 0.00803907146806204, "p90": 0.012583933303093976, "p99": 0.018606097790947705}, "wall": 3325.2749515309697, "tps": 39.705588838364164}, {"tag": "fig4l_lmetric_pd6_full", "arm": "6P+2D", "trace": "full", "n": 793, "req": 1214, "e2e": {"count": 793.0, "mean": 61.907526705667, "p50": 3.69814173609484, "p90": 308.2633092067672, "p99": 477.48038318102715}, "ttft": {"count": 793.0, "mean": 59.25069201986225, "p50": 1.402295546955429, "p90": 302.5604081378088, "p99": 475.3738951798529}, "tpot": {"count": 793.0, "mean": 0.009137289999448822, "p50": 0.008635683270933276, "p90": 0.013065757584108427, "p99": 0.01816783740464599}, "wall": 5662.029295974993, "tps": 39.24494000021532}]