Window 1 results: recompute with fixed metrics + reframe limitations

After the B3 audit bug fixes (joined_analysis hotspot median +
b3_analyze percentile interp), regenerate b3_policy_comparison.json
and the per-policy hotspot_index.json from the same raw run on
dash0 and re-render the three affected figures (apc-vs-hotspot,
latency-bars, per-worker TTFT).

Key number changes in window_1_results.md:
- hotspot_index magnitudes corrected (all five policies; lmetric
  smallest delta at +0.7%, sticky largest at +16.1%)
- "capped reduces hotspot 13%" -> "~10% (2.253 -> 2.020)"
- TTFT/E2E/TPOT percentiles shift by <1% from floor->interp
  (unified TTFT p90 7.24 -> 7.35 s)

Restructured "Caveats" into "Limitations (read this before quoting
B3 numbers)":
1. Agentic dispatch coupling is by design — promoted from caveat
   to top-level methodology framing, tied to
   agentic_dispatch_coupling.md
2. B3 interference_index is binary (not size-graded) — added
3. Hot-sweep cache contamination (<1%) — kept
4. Unified interference unrecoverable — kept with explicit warning
   not to read unified's failure attribution as causal
5. w600 is a sample, not full trace — kept
6. Reuse decomposition is per-token in expectation — added

current_results/characterization_claim_matrix.md updates:
- The "heavy-tail not sole cause" claim now cites the corrected
  ~10% drop with the median bug noted
- New supported claim: "B3 saturated-replay latency gaps include an
  agentic dispatch-coupling feedback term, which is intentional and
  matches production"; cited against agentic_dispatch_coupling.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-26 01:08:55 +08:00
parent 0e82612100
commit 0881942cf3
11 changed files with 131 additions and 72 deletions

View File

@@ -15,6 +15,7 @@ sweep, B2 PD-colo interference microbench).
| Same-worker prefill-decode interference is causal, not correlation. | `supported` | B2 microbench: different-worker control idx 0.92-1.02 across 32× prefill-size variation; same-worker TTFT idx scales 2.15× (2k) → 218× (65k). window_1_results/b2_sweep_summary.json. | — | Synthetic decode load (256-token prompts at 4 req/s) bounds the realism; production behavior is layered on top of B3. |
| The cost of same-worker prefill interference migrates from TPOT to TTFT as prefill size grows past the chunked-prefill horizon. | `supported` | B2 same-worker TPOT p90 idx peaks at 32k (7.89×) and *drops* at 65k (2.26×), while TTFT idx grows monotonically (94.6× → 218×) and TPOT p99 grows monotonically (59 → 169.5 ms). See window_1_results.md "TPOT idx peaks at 32k, not 65k". | — | SLO thresholds for TTFT and TPOT cannot be the same under PD-colo; this should be reflected in B4 SRR sweep design. |
| Hard session affinity (`sticky`) inflates same-worker prefill-decode interference. | `supported` | sticky interference_index 13.65 vs lmetric 6.53; sticky's slow-request breakdown 57% same-worker overlap vs lmetric 23%. | — | Confirms the B2 causal claim observed at the system level. |
| Heavy-tail sessions are a contributor to hot-spot but not the sole cause. | `supported` | Cap-8 trace (37% requests dropped) reduces hotspot_index only 13% (2.241.94). | Run capped under unified to see whether unified's hotspot also persists. | Reviewer might counter that cap=8 is too soft; a stricter cap could be tried. |
| Heavy-tail sessions are a contributor to hot-spot but not the sole cause. | `supported` | Cap-8 trace (37% requests dropped) reduces hotspot_index only ~10% (2.2532.020 after fixing the `joined_analysis.hotspot_index` median bug). | Run capped under unified to see whether unified's hotspot also persists. | Reviewer might counter that cap=8 is too soft; a stricter cap could be tried. |
| B3 saturated-replay latency gaps include an agentic dispatch-coupling feedback term, which is intentional and matches production. | `supported, framed as feature` | `replayer/replay.py:282-287` fires turn N+1 immediately when turn N is behind schedule (no human think-time). Under saturation, slow policies have longer mean session lifetime, more concurrent in-flight, higher worker pressure — so B3 latency gaps reflect "policy + feedback amplification", which is what a production operator switching policies on agentic workload experiences. See `analysis/characterization/agentic_dispatch_coupling.md`. | Run B4 open-loop Poisson at fixed λ to get the orthogonal "controlled-load" measurement; both are needed, not "B4 fixes B3". | Some reviewers will read "non-Poisson arrivals" as benchmark crime; the rebuttal is the agentic-vs-chat workload distinction. |
| SRR per policy under SLO is not yet measured. | `not_yet_supported` | B3 was driven by trace timestamps with strict session sequentiality; saturation is reached but not parameterized. | Run B4 with the A4 open-loop Poisson loadgen, per-class SLO, 5 policies × λ binary search. | Without B4 the paper cannot claim "policy X sustains higher load than Y". |
| Failure attribution near SRR boundary is not yet measured. | `not_yet_supported` | B5 protocol exists; no runs. | After B4, rerun each policy at 0.9× / 1.0× / 1.1× of its SRR_max with the same instrumentation, label slow requests. | The current `joined_analysis.label_slow_requests` is the labeler; needs SRR boundaries to point at. |

View File

@@ -15,9 +15,10 @@ Per-policy artifacts under `window_1_results/`. Figures under `window_1_results/
| LMetric leaves 23 pp of APC on the table | **supported** | lmetric achieved 56.9% vs intra-session ceiling 79.6% (theoretical) |
| Hard session affinity recovers the locality lost by LMetric | **supported** | sticky APC 77.2% = 97% of theoretical ceiling |
| Hard affinity inflates same-worker prefill-decode interference | **supported** | sticky interference_index 13.65 vs lmetric 6.53 |
| Hybrid affinity (Unified) breaks the locality-vs-latency tradeoff | **supported** | unified hits 79.4% APC and TTFT p90 7.24 s (lmetric 15.6 s) simultaneously |
| Hybrid affinity (Unified) breaks the locality-vs-latency tradeoff | **supported** | unified hits 79.4% APC and TTFT p90 7.35 s (lmetric 15.67 s) simultaneously |
| Same-worker prefill-decode interference is causal, not correlation | **supported** | different-worker control idx1.0; same-worker idx scales monotonically with prefill size |
| Heavy-tail sessions are *a* contributor to hot-spot, not the sole cause | **supported** | cap=8 truncated trace cuts 37% of work; hotspot drops only 13% (2.241.94) |
| Heavy-tail sessions are *a* contributor to hot-spot, not the sole cause | **supported** | cap=8 truncated trace cuts 37% of work; hotspot drops only ~10% (2.2532.020) |
| The agentic dispatch coupling amplifies policy gaps under saturation | **supported, framed as feature** | Slow policy longer session lifetime more concurrent in-flight harder system. B3 measures the combined policy + feedback effect, which is what an agentic operator experiences. See `agentic_dispatch_coupling.md`. |
## B1' Workload characterization
@@ -66,14 +67,26 @@ Gap "any intra" is 0.7 pp → no meaningful cross-session sharing in this tr
| policy | TTFT p50/p90/p99 | TPOT p50/p90/p99 ms | E2E p50/p90/p99 | **APC** | interference | **hotspot** | n_slow |
|---|---|---|---|---:|---:|---:|---:|
| lmetric | 0.94 / 15.59 / 52.95 | 8.9 / 21.2 / 175.9 | 2.75 / 24.75 / 79.62 | 56.9% | 6.53 | 2.24 | 295 |
| load_only | 1.25 / 20.15 / 52.65 | 9.2 / 26.7 / 320.7 | 3.58 / 33.43 / 93.92 | 54.1% | 9.16 | **1.14** | 379 |
| sticky | 0.54 / 18.02 / 71.37 | 8.9 / 36.1 / 345.2 | 2.08 / 34.61 / 133.58 | 77.2% | **13.65** | 2.35 | 234 |
| **unified** | **0.50 / 7.24 / 42.02** | 8.1 / 17.1 / 118.1 | **1.75 / 17.89 / 68.18** | **79.4%** | n/a* | 3.35 | **189** |
| capped | 1.20 / 12.76 / 46.05 | 7.2 / 16.0 / 101.5 | 2.59 / 21.24 / 73.39 | 31.6% | 6.33 | 1.94 | 185 |
| lmetric | 0.94 / 15.67 / 53.57 | 8.9 / 21.2 / 176.9 | 2.75 / 24.82 / 79.83 | 56.9% | 6.53 | 2.253 | 295 |
| load_only | 1.26 / 20.20 / 52.84 | 9.2 / 26.9 / 320.7 | 3.59 / 33.46 / 93.93 | 54.1% | 9.16 | **1.294** | 379 |
| sticky | 0.54 / 18.02 / 74.09 | 8.9 / 36.4 / 357.2 | 2.08 / 34.63 / 134.36 | 77.2% | **13.65** | 2.728 | 234 |
| **unified** | **0.50 / 7.35 / 42.34** | 8.1 / 17.1 / 118.3 | **1.75 / 18.03 / 68.43** | **79.4%** | n/a* | **3.667** | **189** |
| capped | 1.20 / 12.83 / 46.62 | 7.2 / 16.0 / 101.7 | 2.59 / 21.25 / 73.79 | 31.6% | 6.33 | 2.020 | 185 |
\*unified `engine_state` was overwritten by my analyzer's slice step before the `b3_analyze.sh` fix landed; vLLM and the patch worked correctly. The B2 microbench provides a cleaner interference proof.
> **Methodology note (read before interpreting latency comparisons)**: B3 uses
> session-sequential trace dispatch — turn N+1 fires the instant turn N
> completes when the trace timestamp has already passed. This is the right
> model of agentic workloads (tool-call driven, no user think-time), but it
> means under saturation each policy's effective in-flight session count is
> a function of its own per-turn latency (slower policy → longer mean
> session lifetime → more concurrent in-flight). The reported gaps are
> therefore "policy + agentic-feedback-amplification", which is what a
> production agentic operator would experience when switching policies.
> See `agentic_dispatch_coupling.md` for the full argument. B4 will report
> the orthogonal "fixed-λ open-loop" measurement.
**Mechanism indices**
- `interference_index` = TPOT_p90(decode overlapping same-worker prefill) / TPOT_p90(clean)
- `hotspot_index` = max(worker TTFT p90) / median(worker TTFT p90)
@@ -85,10 +98,10 @@ Figures: `fig_b3_latency_bars.png`, `fig_b3_apc_vs_upper.png`,
### Per-policy reading
- **lmetric** is the cache-aware baseline. APC 56.9% achieves only 71% of the intra-session ceiling — the missing 23 pp is the locality opportunity unified picks up.
- **load_only** strips cache awareness. Hot-spot drops to 1.14 (best), but APC only drops 3 pp because the picker's `min(num_requests)` tie-break to instance 0 creates accidental stickiness at low concurrency.
- **load_only** strips cache awareness. Hot-spot drops to 1.294 (best), but APC only drops 3 pp because the picker's `min(num_requests)` tie-break to instance 0 creates accidental stickiness at low concurrency.
- **sticky** locks each session to one worker. APC climbs to 77.2% (97% of ceiling) but interference doubles to 13.65 and TPOT p99 hits 345 ms.
- **unified** is the hybrid — affinity gate `(cache_ratio>0.5 AND num_req ≤ 2×avg)` keeps locality where it pays and drops it where it would hurt. The result is APC 79.4% **and** TTFT p90 cut in half from lmetric. The one bad worker (engine_4 at 37.7s p90) drives `hotspot_index=3.35`, but the other seven workers are all under 18 s.
- **capped** runs lmetric on a turn-capped trace (max 8 turns/session). Removes 37% of requests but APC also crashes to 31.6% and hotspot only improves by 13%. This is the session-mass ablation: heavy sessions are *a* contributor to hot-spot but not the sole cause.
- **unified** is the hybrid — affinity gate `(cache_ratio>0.5 AND num_req ≤ 2×avg)` keeps locality where it pays and drops it where it would hurt. The result is APC 79.4% **and** TTFT p90 cut in half from lmetric. The one bad worker (engine_4 at 37.7s p90) drives `hotspot_index=3.667`, but the other seven workers are all under 18 s.
- **capped** runs lmetric on a turn-capped trace (max 8 turns/session). Removes 37% of requests but APC also crashes to 31.6% and hotspot only improves by ~10% (2.253 → 2.020). This is the session-mass ablation: heavy sessions are *a* contributor to hot-spot but not the sole cause.
### Slow-request cause breakdown (from `joined_analysis.label_slow_requests`)
@@ -168,11 +181,56 @@ Optional / paper-polish runs (not blocking the story):
4. B2 with the proxy in path — measure whether the production cache_aware routing actually pushes prefill and decode onto different workers in practice.
5. KV-occupancy timeline per worker — needs polling `vllm:gpu_cache_usage` during B3 reruns; useful for "KV pressure drives cache miss" subsection.
## Caveats and known data hygiene issues
## Limitations (read this before quoting B3 numbers)
- **APC contamination across B3 hot-sweep**: `lmetric` ran from cold; `load_only` and `sticky` ran on the same 8 vLLMs without restart. Empirical contamination is < 1% (verified by first-turn cached_tokens distribution), but `unified` and `capped` were rerun cold-start specifically to remove any residual concern.
- **Unified's `interference_index` is missing** because the original `b3_analyze.sh` unconditionally truncate-wrote sliced engine_state files; isolated runs that wrote engine_state into their own per-policy directory were overwritten. Fixed in commit `df32499`; capped was the first run to benefit and survived with intact 86 MB engine_state.
- **w600 is not the full GLM-5.1 trace** (1214 req vs 2.11 M). All B3/B2 percentiles are on the sample. The full-trace KV-footprint stats are on the full trace.
1. **Agentic dispatch coupling is by design**. B3 is the
"production-replay under captured agentic load" experiment, not the
"controlled-load envelope" experiment. Latency p90 reflects both
per-request policy effect AND the agentic feedback amplification
(slow policy → longer mean session lifetime → more concurrent
in-flight). Both contributions are real and visible to a production
operator; **the paper must report both, not subtract one**. See
`agentic_dispatch_coupling.md`. The orthogonal "fixed-λ Poisson"
measurement is B4.
2. **B3 `interference_index` is a binary indicator**. A decode is
labeled "overlap" iff *any* other request's prefill exists on the
chosen worker during `[t_first_token, t_finish]`, regardless of
prefill size. B2's per-prefill-size cells (2k = 1.16×, 65k = 2.26×)
cannot be directly read off B3's aggregate numbers (lmetric 6.53,
sticky 13.65). The B3 numbers are size-weighted averages of the
per-cell signal, valid for *within-B3 cross-policy* comparison but
not for direct cross-batch numerical comparison with B2.
3. **Hot-sweep cache contamination (low)**: `lmetric` ran from cold;
`load_only` and `sticky` ran on the same 8 vLLMs without restart.
First-turn cached_tokens verification puts empirical contamination
at < 1% APC, well below the cross-policy gaps. `unified` and
`capped` were rerun cold-start specifically to remove any residual
concern.
4. **Unified's `interference_index` is missing**. The original
`b3_analyze.sh` unconditionally truncate-wrote sliced engine_state
files; isolated runs that wrote engine_state into their own
per-policy directory were overwritten. Fixed in commit `df32499`;
capped was the first run to benefit and survived. **Implication**:
unified's slow-request mechanism breakdown (rows 0 / 116 / 18 / 55
for same-worker overlap / hot worker queue / cache miss / unknown)
has the "same-worker overlap" label *unrecoverable* and forced into
the catch-all buckets do not read unified's failure attribution
as causal.
5. **w600 is not the full GLM-5.1 trace** (1214 req vs 2.11 M). All
B3/B2 percentiles are on the sample. The full-trace KV-footprint
stats are on the full trace.
6. **Reuse decomposition (intra/cross/shared/unclassified) is
per-cached-token only in expectation** `joined_analysis.py`
distributes a request's `cached_tokens` count uniformly across its
`hash_ids` and classifies block-by-block. For the w600 trace with
<1% cross-session sharing the qualitative split is robust; for
workloads with mixed-class hashes within a single request the
classifier should be revisited.
## Reproduction commands

View File

@@ -4,18 +4,18 @@
"policy": "capped",
"n_ok": 770,
"n_total": 770,
"ttft_p50_s": 1.195636051998008,
"ttft_p90_s": 12.762421467981767,
"ttft_p99_s": 46.05476947501302,
"tpot_p50_s": 0.007229394937166944,
"tpot_p90_s": 0.015995440982929352,
"tpot_p99_s": 0.10145225453431651,
"e2e_p50_s": 2.5921602529706433,
"e2e_p90_s": 21.238469071977306,
"e2e_p99_s": 73.38509433099534,
"ttft_p50_s": 1.1989156164927408,
"ttft_p90_s": 12.827629912580612,
"ttft_p99_s": 46.61752380923125,
"tpot_p50_s": 0.007231239004497606,
"tpot_p90_s": 0.015998617687440243,
"tpot_p99_s": 0.11515370831539476,
"e2e_p50_s": 2.598489043477457,
"e2e_p90_s": 21.245602010778384,
"e2e_p99_s": 74.60736650204846,
"apc_ratio": 0.3158312503528108,
"interference_index": 6.331064378362814,
"hotspot_index_ttft_p90": 1.9366915542605314,
"hotspot_index_ttft_p90": 2.0204268015410918,
"reuse_intra_frac": 0.9192657105586233,
"reuse_cross_frac": 0.0602232594931501,
"n_slow": 185,
@@ -30,18 +30,18 @@
"policy": "lmetric",
"n_ok": 1214,
"n_total": 1214,
"ttft_p50_s": 0.9369571270071901,
"ttft_p90_s": 15.592678204004187,
"ttft_p99_s": 52.95170431700535,
"tpot_p50_s": 0.008851506907892485,
"tpot_p90_s": 0.02120516549011311,
"tpot_p99_s": 0.17592118933357093,
"e2e_p50_s": 2.7527842019917443,
"e2e_p90_s": 24.75416105298791,
"e2e_p99_s": 79.61890332301846,
"ttft_p50_s": 0.9387824369769078,
"ttft_p90_s": 15.671339168207492,
"ttft_p99_s": 53.56683189840049,
"tpot_p50_s": 0.008854518407308914,
"tpot_p90_s": 0.02122720699121469,
"tpot_p99_s": 0.18280341184277568,
"e2e_p50_s": 2.754255389008904,
"e2e_p90_s": 24.8209177934099,
"e2e_p99_s": 80.59924928059091,
"apc_ratio": 0.5694312382571595,
"interference_index": 6.530231061794441,
"hotspot_index_ttft_p90": 2.237981740718548,
"hotspot_index_ttft_p90": 2.252837147833725,
"reuse_intra_frac": 0.9321238805590836,
"reuse_cross_frac": 0.05679481258506571,
"n_slow": 295,
@@ -56,18 +56,18 @@
"policy": "load_only",
"n_ok": 1214,
"n_total": 1214,
"ttft_p50_s": 1.2542553890380077,
"ttft_p90_s": 20.14692750602262,
"ttft_p99_s": 52.64810254302574,
"tpot_p50_s": 0.00923045912795929,
"tpot_p90_s": 0.02672785480314115,
"tpot_p99_s": 0.3207044094773148,
"e2e_p50_s": 3.584156609023921,
"e2e_p90_s": 33.42658680601744,
"e2e_p99_s": 93.91839688795153,
"ttft_p50_s": 1.2609447415161412,
"ttft_p90_s": 20.197147866390882,
"ttft_p99_s": 52.84285237012196,
"tpot_p50_s": 0.009231464695980247,
"tpot_p90_s": 0.026851662550158716,
"tpot_p99_s": 0.3211630676943426,
"e2e_p50_s": 3.58568156149704,
"e2e_p90_s": 33.459180271782685,
"e2e_p99_s": 93.95083751494239,
"apc_ratio": 0.5412093853102866,
"interference_index": 9.16424627504275,
"hotspot_index_ttft_p90": 1.1400531308102801,
"hotspot_index_ttft_p90": 1.2940319990630569,
"reuse_intra_frac": 0.9353191550754928,
"reuse_cross_frac": 0.053372184678592026,
"n_slow": 379,
@@ -82,18 +82,18 @@
"policy": "sticky",
"n_ok": 1214,
"n_total": 1214,
"ttft_p50_s": 0.540947844972834,
"ttft_p90_s": 18.016640832996927,
"ttft_p99_s": 71.37327494798228,
"tpot_p50_s": 0.00894752275507555,
"tpot_p90_s": 0.0360956137329512,
"tpot_p99_s": 0.34523129428917954,
"e2e_p50_s": 2.0788628259906545,
"e2e_p90_s": 34.605129147996195,
"e2e_p99_s": 133.5824547969969,
"ttft_p50_s": 0.5415176274836995,
"ttft_p90_s": 18.021296651283045,
"ttft_p99_s": 74.09429564891524,
"tpot_p50_s": 0.008952101894096181,
"tpot_p90_s": 0.03641285916619554,
"tpot_p99_s": 0.35152006935195085,
"e2e_p50_s": 2.081947358994512,
"e2e_p90_s": 34.62592205510591,
"e2e_p99_s": 139.68334607904353,
"apc_ratio": 0.7720092868396378,
"interference_index": 13.651718321568111,
"hotspot_index_ttft_p90": 2.3493858974059214,
"hotspot_index_ttft_p90": 2.727756623171119,
"reuse_intra_frac": 0.9327723488279339,
"reuse_cross_frac": 0.05495149683864246,
"n_slow": 234,
@@ -109,17 +109,17 @@
"n_ok": 1213,
"n_total": 1214,
"ttft_p50_s": 0.4997710260213353,
"ttft_p90_s": 7.239999514014926,
"ttft_p99_s": 42.022206099005416,
"ttft_p90_s": 7.345769894809922,
"ttft_p99_s": 42.34170345296613,
"tpot_p50_s": 0.008079791456705824,
"tpot_p90_s": 0.017107906969874808,
"tpot_p99_s": 0.11808861252148231,
"tpot_p90_s": 0.017110194704198407,
"tpot_p99_s": 0.12655874612209597,
"e2e_p50_s": 1.7495028690318577,
"e2e_p90_s": 17.893827292020433,
"e2e_p99_s": 68.18008507299237,
"e2e_p90_s": 18.033410895219994,
"e2e_p99_s": 68.80023987947489,
"apc_ratio": 0.794261466256467,
"interference_index": null,
"hotspot_index_ttft_p90": 3.3497107140827365,
"hotspot_index_ttft_p90": 3.667136528736114,
"reuse_intra_frac": 0.9311187350942534,
"reuse_cross_frac": 0.056702150437367635,
"n_slow": 189,

Binary file not shown.

Before

Width:  |  Height:  |  Size: 39 KiB

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

After

Width:  |  Height:  |  Size: 52 KiB

View File

@@ -1,5 +1,5 @@
{
"hotspot_index_ttft_p90": 1.9366915542605314,
"hotspot_index_ttft_p90": 2.0204268015410918,
"per_worker_latency_p90_s": {
"http://127.0.0.1:8000": 23.81083881931848,
"http://127.0.0.1:8001": 18.139674991380897,
@@ -21,4 +21,4 @@
"http://127.0.0.1:8007": 9.661995008389932
},
"status": "supported"
}
}

View File

@@ -1,5 +1,5 @@
{
"hotspot_index_ttft_p90": 2.237981740718548,
"hotspot_index_ttft_p90": 2.252837147833725,
"per_worker_latency_p90_s": {
"http://127.0.0.1:8000": 34.71445541951107,
"http://127.0.0.1:8001": 21.922988962882666,
@@ -21,4 +21,4 @@
"http://127.0.0.1:8007": 11.777357225219024
},
"status": "supported"
}
}

View File

@@ -1,5 +1,5 @@
{
"hotspot_index_ttft_p90": 1.1400531308102801,
"hotspot_index_ttft_p90": 1.2940319990630569,
"per_worker_latency_p90_s": {
"http://127.0.0.1:8000": 33.51168999259829,
"http://127.0.0.1:8001": 29.20308109278556,
@@ -21,4 +21,4 @@
"http://127.0.0.1:8007": 13.95184187250561
},
"status": "supported"
}
}

View File

@@ -1,5 +1,5 @@
{
"hotspot_index_ttft_p90": 2.3493858974059214,
"hotspot_index_ttft_p90": 2.727756623171119,
"per_worker_latency_p90_s": {
"http://127.0.0.1:8000": 30.185792533413043,
"http://127.0.0.1:8001": 47.49661003401852,
@@ -21,4 +21,4 @@
"http://127.0.0.1:8007": 2.4984901855932535
},
"status": "supported"
}
}

View File

@@ -1,5 +1,5 @@
{
"hotspot_index_ttft_p90": 3.3497107140827365,
"hotspot_index_ttft_p90": 3.667136528736114,
"per_worker_latency_p90_s": {
"http://127.0.0.1:8000": 41.42001512600109,
"http://127.0.0.1:8001": 12.4878579101933,
@@ -21,4 +21,4 @@
"http://127.0.0.1:8007": 7.772977900883419
},
"status": "supported"
}
}