Window 1 results: recompute with fixed metrics + reframe limitations

After the B3 audit bug fixes (joined_analysis hotspot median + b3_analyze percentile interp), regenerate b3_policy_comparison.json and the per-policy hotspot_index.json from the same raw run on dash0 and re-render the three affected figures (apc-vs-hotspot, latency-bars, per-worker TTFT). Key number changes in window_1_results.md: - hotspot_index magnitudes corrected (all five policies; lmetric smallest delta at +0.7%, sticky largest at +16.1%) - "capped reduces hotspot 13%" -> "~10% (2.253 -> 2.020)" - TTFT/E2E/TPOT percentiles shift by <1% from floor->interp (unified TTFT p90 7.24 -> 7.35 s) Restructured "Caveats" into "Limitations (read this before quoting B3 numbers)": 1. Agentic dispatch coupling is by design — promoted from caveat to top-level methodology framing, tied to agentic_dispatch_coupling.md 2. B3 interference_index is binary (not size-graded) — added 3. Hot-sweep cache contamination (<1%) — kept 4. Unified interference unrecoverable — kept with explicit warning not to read unified's failure attribution as causal 5. w600 is a sample, not full trace — kept 6. Reuse decomposition is per-token in expectation — added current_results/characterization_claim_matrix.md updates: - The "heavy-tail not sole cause" claim now cites the corrected ~10% drop with the median bug noted - New supported claim: "B3 saturated-replay latency gaps include an agentic dispatch-coupling feedback term, which is intentional and matches production"; cited against agentic_dispatch_coupling.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 01:08:55 +08:00
parent 0e82612100
commit 0881942cf3
11 changed files with 131 additions and 72 deletions
--- a/analysis/characterization/current_results/characterization_claim_matrix.md
+++ b/analysis/characterization/current_results/characterization_claim_matrix.md
@@ -15,6 +15,7 @@ sweep, B2 PD-colo interference microbench).
 | Same-worker prefill-decode interference is causal, not correlation. | `supported` | B2 microbench: different-worker control idx 0.92-1.02 across 32× prefill-size variation; same-worker TTFT idx scales 2.15× (2k) → 218× (65k). window_1_results/b2_sweep_summary.json. | — | Synthetic decode load (256-token prompts at 4 req/s) bounds the realism; production behavior is layered on top of B3. |
 | The cost of same-worker prefill interference migrates from TPOT to TTFT as prefill size grows past the chunked-prefill horizon. | `supported` | B2 same-worker TPOT p90 idx peaks at 32k (7.89×) and *drops* at 65k (2.26×), while TTFT idx grows monotonically (94.6× → 218×) and TPOT p99 grows monotonically (59 → 169.5 ms). See window_1_results.md "TPOT idx peaks at 32k, not 65k". | — | SLO thresholds for TTFT and TPOT cannot be the same under PD-colo; this should be reflected in B4 SRR sweep design. |
 | Hard session affinity (`sticky`) inflates same-worker prefill-decode interference. | `supported` | sticky interference_index 13.65 vs lmetric 6.53; sticky's slow-request breakdown 57% same-worker overlap vs lmetric 23%. | — | Confirms the B2 causal claim observed at the system level. |
-| Heavy-tail sessions are a contributor to hot-spot but not the sole cause. | `supported` | Cap-8 trace (37% requests dropped) reduces hotspot_index only 13% (2.24 → 1.94). | Run capped under unified to see whether unified's hotspot also persists. | Reviewer might counter that cap=8 is too soft; a stricter cap could be tried. |
+| Heavy-tail sessions are a contributor to hot-spot but not the sole cause. | `supported` | Cap-8 trace (37% requests dropped) reduces hotspot_index only ~10% (2.253 → 2.020 after fixing the `joined_analysis.hotspot_index` median bug). | Run capped under unified to see whether unified's hotspot also persists. | Reviewer might counter that cap=8 is too soft; a stricter cap could be tried. |
+| B3 saturated-replay latency gaps include an agentic dispatch-coupling feedback term, which is intentional and matches production. | `supported, framed as feature` | `replayer/replay.py:282-287` fires turn N+1 immediately when turn N is behind schedule (no human think-time). Under saturation, slow policies have longer mean session lifetime, more concurrent in-flight, higher worker pressure — so B3 latency gaps reflect "policy + feedback amplification", which is what a production operator switching policies on agentic workload experiences. See `analysis/characterization/agentic_dispatch_coupling.md`. | Run B4 open-loop Poisson at fixed λ to get the orthogonal "controlled-load" measurement; both are needed, not "B4 fixes B3". | Some reviewers will read "non-Poisson arrivals" as benchmark crime; the rebuttal is the agentic-vs-chat workload distinction. |
 | SRR per policy under SLO is not yet measured. | `not_yet_supported` | B3 was driven by trace timestamps with strict session sequentiality; saturation is reached but not parameterized. | Run B4 with the A4 open-loop Poisson loadgen, per-class SLO, 5 policies × λ binary search. | Without B4 the paper cannot claim "policy X sustains higher load than Y". |
 | Failure attribution near SRR boundary is not yet measured. | `not_yet_supported` | B5 protocol exists; no runs. | After B4, rerun each policy at 0.9× / 1.0× / 1.1× of its SRR_max with the same instrumentation, label slow requests. | The current `joined_analysis.label_slow_requests` is the labeler; needs SRR boundaries to point at. |
--- a/analysis/characterization/window_1_results.md
+++ b/analysis/characterization/window_1_results.md
@@ -15,9 +15,10 @@ Per-policy artifacts under `window_1_results/`. Figures under `window_1_results/
 | LMetric leaves 23 pp of APC on the table | **supported** | lmetric achieved 56.9% vs intra-session ceiling 79.6% (theoretical) |
 | Hard session affinity recovers the locality lost by LMetric | **supported** | sticky APC 77.2% = 97% of theoretical ceiling |
 | Hard affinity inflates same-worker prefill-decode interference | **supported** | sticky interference_index 13.65 vs lmetric 6.53 |
-| Hybrid affinity (Unified) breaks the locality-vs-latency tradeoff | **supported** | unified hits 79.4% APC and TTFT p90 7.24 s (lmetric 15.6 s) simultaneously |
+| Hybrid affinity (Unified) breaks the locality-vs-latency tradeoff | **supported** | unified hits 79.4% APC and TTFT p90 7.35 s (lmetric 15.67 s) simultaneously |
 | Same-worker prefill-decode interference is causal, not correlation | **supported** | different-worker control idx≈1.0; same-worker idx scales monotonically with prefill size |
-| Heavy-tail sessions are *a* contributor to hot-spot, not the sole cause | **supported** | cap=8 truncated trace cuts 37% of work; hotspot drops only 13% (2.24→1.94) |
+| Heavy-tail sessions are *a* contributor to hot-spot, not the sole cause | **supported** | cap=8 truncated trace cuts 37% of work; hotspot drops only ~10% (2.253→2.020) |
+| The agentic dispatch coupling amplifies policy gaps under saturation | **supported, framed as feature** | Slow policy → longer session lifetime → more concurrent in-flight → harder system. B3 measures the combined policy + feedback effect, which is what an agentic operator experiences. See `agentic_dispatch_coupling.md`. |

 ## B1' Workload characterization

@@ -66,14 +67,26 @@ Gap "any − intra" is 0.7 pp → no meaningful cross-session sharing in this tr

 | policy | TTFT p50/p90/p99 | TPOT p50/p90/p99 ms | E2E p50/p90/p99 | **APC** | interference | **hotspot** | n_slow |
 |---|---|---|---|---:|---:|---:|---:|
-| lmetric | 0.94 / 15.59 / 52.95 | 8.9 / 21.2 / 175.9 | 2.75 / 24.75 / 79.62 | 56.9% | 6.53 | 2.24 | 295 |
-| load_only | 1.25 / 20.15 / 52.65 | 9.2 / 26.7 / 320.7 | 3.58 / 33.43 / 93.92 | 54.1% | 9.16 | **1.14** | 379 |
-| sticky | 0.54 / 18.02 / 71.37 | 8.9 / 36.1 / 345.2 | 2.08 / 34.61 / 133.58 | 77.2% | **13.65** | 2.35 | 234 |
-| **unified** | **0.50 / 7.24 / 42.02** | 8.1 / 17.1 / 118.1 | **1.75 / 17.89 / 68.18** | **79.4%** | n/a* | 3.35 | **189** |
-| capped | 1.20 / 12.76 / 46.05 | 7.2 / 16.0 / 101.5 | 2.59 / 21.24 / 73.39 | 31.6% | 6.33 | 1.94 | 185 |
+| lmetric | 0.94 / 15.67 / 53.57 | 8.9 / 21.2 / 176.9 | 2.75 / 24.82 / 79.83 | 56.9% | 6.53 | 2.253 | 295 |
+| load_only | 1.26 / 20.20 / 52.84 | 9.2 / 26.9 / 320.7 | 3.59 / 33.46 / 93.93 | 54.1% | 9.16 | **1.294** | 379 |
+| sticky | 0.54 / 18.02 / 74.09 | 8.9 / 36.4 / 357.2 | 2.08 / 34.63 / 134.36 | 77.2% | **13.65** | 2.728 | 234 |
+| **unified** | **0.50 / 7.35 / 42.34** | 8.1 / 17.1 / 118.3 | **1.75 / 18.03 / 68.43** | **79.4%** | n/a* | **3.667** | **189** |
+| capped | 1.20 / 12.83 / 46.62 | 7.2 / 16.0 / 101.7 | 2.59 / 21.25 / 73.79 | 31.6% | 6.33 | 2.020 | 185 |

 \*unified `engine_state` was overwritten by my analyzer's slice step before the `b3_analyze.sh` fix landed; vLLM and the patch worked correctly. The B2 microbench provides a cleaner interference proof.

+> **Methodology note (read before interpreting latency comparisons)**: B3 uses
+> session-sequential trace dispatch — turn N+1 fires the instant turn N
+> completes when the trace timestamp has already passed. This is the right
+> model of agentic workloads (tool-call driven, no user think-time), but it
+> means under saturation each policy's effective in-flight session count is
+> a function of its own per-turn latency (slower policy → longer mean
+> session lifetime → more concurrent in-flight). The reported gaps are
+> therefore "policy + agentic-feedback-amplification", which is what a
+> production agentic operator would experience when switching policies.
+> See `agentic_dispatch_coupling.md` for the full argument. B4 will report
+> the orthogonal "fixed-λ open-loop" measurement.
+
 **Mechanism indices**
 - `interference_index` = TPOT_p90(decode overlapping same-worker prefill) / TPOT_p90(clean)
 - `hotspot_index` = max(worker TTFT p90) / median(worker TTFT p90)
@@ -85,10 +98,10 @@ Figures: `fig_b3_latency_bars.png`, `fig_b3_apc_vs_upper.png`,
 ### Per-policy reading

 - **lmetric** is the cache-aware baseline. APC 56.9% achieves only 71% of the intra-session ceiling — the missing 23 pp is the locality opportunity unified picks up.
- **load_only** strips cache awareness. Hot-spot drops to 1.14 (best), but APC only drops 3 pp because the picker's `min(num_requests)` tie-break to instance 0 creates accidental stickiness at low concurrency.
+- **load_only** strips cache awareness. Hot-spot drops to 1.294 (best), but APC only drops 3 pp because the picker's `min(num_requests)` tie-break to instance 0 creates accidental stickiness at low concurrency.
 - **sticky** locks each session to one worker. APC climbs to 77.2% (97% of ceiling) but interference doubles to 13.65 and TPOT p99 hits 345 ms.
- **unified** is the hybrid — affinity gate `(cache_ratio>0.5 AND num_req ≤ 2×avg)` keeps locality where it pays and drops it where it would hurt. The result is APC 79.4% **and** TTFT p90 cut in half from lmetric. The one bad worker (engine_4 at 37.7s p90) drives `hotspot_index=3.35`, but the other seven workers are all under 18 s.
- **capped** runs lmetric on a turn-capped trace (max 8 turns/session). Removes 37% of requests but APC also crashes to 31.6% and hotspot only improves by 13%. This is the session-mass ablation: heavy sessions are *a* contributor to hot-spot but not the sole cause.
+- **unified** is the hybrid — affinity gate `(cache_ratio>0.5 AND num_req ≤ 2×avg)` keeps locality where it pays and drops it where it would hurt. The result is APC 79.4% **and** TTFT p90 cut in half from lmetric. The one bad worker (engine_4 at 37.7s p90) drives `hotspot_index=3.667`, but the other seven workers are all under 18 s.
+- **capped** runs lmetric on a turn-capped trace (max 8 turns/session). Removes 37% of requests but APC also crashes to 31.6% and hotspot only improves by ~10% (2.253 → 2.020). This is the session-mass ablation: heavy sessions are *a* contributor to hot-spot but not the sole cause.

 ### Slow-request cause breakdown (from `joined_analysis.label_slow_requests`)

@@ -168,11 +181,56 @@ Optional / paper-polish runs (not blocking the story):
 4. B2 with the proxy in path — measure whether the production cache_aware routing actually pushes prefill and decode onto different workers in practice.
 5. KV-occupancy timeline per worker — needs polling `vllm:gpu_cache_usage` during B3 reruns; useful for "KV pressure drives cache miss" subsection.

-## Caveats and known data hygiene issues
+## Limitations (read this before quoting B3 numbers)

- **APC contamination across B3 hot-sweep**: `lmetric` ran from cold; `load_only` and `sticky` ran on the same 8 vLLMs without restart. Empirical contamination is < 1% (verified by first-turn cached_tokens distribution), but `unified` and `capped` were rerun cold-start specifically to remove any residual concern.
- **Unified's `interference_index` is missing** because the original `b3_analyze.sh` unconditionally truncate-wrote sliced engine_state files; isolated runs that wrote engine_state into their own per-policy directory were overwritten. Fixed in commit `df32499`; capped was the first run to benefit and survived with intact 86 MB engine_state.
- **w600 is not the full GLM-5.1 trace** (1214 req vs 2.11 M). All B3/B2 percentiles are on the sample. The full-trace KV-footprint stats are on the full trace.
+1. **Agentic dispatch coupling is by design**. B3 is the
+   "production-replay under captured agentic load" experiment, not the
+   "controlled-load envelope" experiment. Latency p90 reflects both
+   per-request policy effect AND the agentic feedback amplification
+   (slow policy → longer mean session lifetime → more concurrent
+   in-flight). Both contributions are real and visible to a production
+   operator; **the paper must report both, not subtract one**. See
+   `agentic_dispatch_coupling.md`. The orthogonal "fixed-λ Poisson"
+   measurement is B4.
+
+2. **B3 `interference_index` is a binary indicator**. A decode is
+   labeled "overlap" iff *any* other request's prefill exists on the
+   chosen worker during `[t_first_token, t_finish]`, regardless of
+   prefill size. B2's per-prefill-size cells (2k = 1.16×, 65k = 2.26×)
+   cannot be directly read off B3's aggregate numbers (lmetric 6.53,
+   sticky 13.65). The B3 numbers are size-weighted averages of the
+   per-cell signal, valid for *within-B3 cross-policy* comparison but
+   not for direct cross-batch numerical comparison with B2.
+
+3. **Hot-sweep cache contamination (low)**: `lmetric` ran from cold;
+   `load_only` and `sticky` ran on the same 8 vLLMs without restart.
+   First-turn cached_tokens verification puts empirical contamination
+   at < 1% APC, well below the cross-policy gaps. `unified` and
+   `capped` were rerun cold-start specifically to remove any residual
+   concern.
+
+4. **Unified's `interference_index` is missing**. The original
+   `b3_analyze.sh` unconditionally truncate-wrote sliced engine_state
+   files; isolated runs that wrote engine_state into their own
+   per-policy directory were overwritten. Fixed in commit `df32499`;
+   capped was the first run to benefit and survived. **Implication**:
+   unified's slow-request mechanism breakdown (rows 0 / 116 / 18 / 55
+   for same-worker overlap / hot worker queue / cache miss / unknown)
+   has the "same-worker overlap" label *unrecoverable* and forced into
+   the catch-all buckets — do not read unified's failure attribution
+   as causal.
+
+5. **w600 is not the full GLM-5.1 trace** (1214 req vs 2.11 M). All
+   B3/B2 percentiles are on the sample. The full-trace KV-footprint
+   stats are on the full trace.
+
+6. **Reuse decomposition (intra/cross/shared/unclassified) is
+   per-cached-token only in expectation** — `joined_analysis.py`
+   distributes a request's `cached_tokens` count uniformly across its
+   `hash_ids` and classifies block-by-block. For the w600 trace with
+   <1% cross-session sharing the qualitative split is robust; for
+   workloads with mixed-class hashes within a single request the
+   classifier should be revisited.

 ## Reproduction commands

--- a/analysis/characterization/window_1_results/b3_policy_comparison.json
+++ b/analysis/characterization/window_1_results/b3_policy_comparison.json
@@ -4,18 +4,18 @@
      "policy": "capped",
      "n_ok": 770,
      "n_total": 770,
-      "ttft_p50_s": 1.195636051998008,
-      "ttft_p90_s": 12.762421467981767,
-      "ttft_p99_s": 46.05476947501302,
-      "tpot_p50_s": 0.007229394937166944,
-      "tpot_p90_s": 0.015995440982929352,
-      "tpot_p99_s": 0.10145225453431651,
-      "e2e_p50_s": 2.5921602529706433,
-      "e2e_p90_s": 21.238469071977306,
-      "e2e_p99_s": 73.38509433099534,
+      "ttft_p50_s": 1.1989156164927408,
+      "ttft_p90_s": 12.827629912580612,
+      "ttft_p99_s": 46.61752380923125,
+      "tpot_p50_s": 0.007231239004497606,
+      "tpot_p90_s": 0.015998617687440243,
+      "tpot_p99_s": 0.11515370831539476,
+      "e2e_p50_s": 2.598489043477457,
+      "e2e_p90_s": 21.245602010778384,
+      "e2e_p99_s": 74.60736650204846,
      "apc_ratio": 0.3158312503528108,
      "interference_index": 6.331064378362814,
-      "hotspot_index_ttft_p90": 1.9366915542605314,
+      "hotspot_index_ttft_p90": 2.0204268015410918,
      "reuse_intra_frac": 0.9192657105586233,
      "reuse_cross_frac": 0.0602232594931501,
      "n_slow": 185,
@@ -30,18 +30,18 @@
      "policy": "lmetric",
      "n_ok": 1214,
      "n_total": 1214,
-      "ttft_p50_s": 0.9369571270071901,
-      "ttft_p90_s": 15.592678204004187,
-      "ttft_p99_s": 52.95170431700535,
-      "tpot_p50_s": 0.008851506907892485,
-      "tpot_p90_s": 0.02120516549011311,
-      "tpot_p99_s": 0.17592118933357093,
-      "e2e_p50_s": 2.7527842019917443,
-      "e2e_p90_s": 24.75416105298791,
-      "e2e_p99_s": 79.61890332301846,
+      "ttft_p50_s": 0.9387824369769078,
+      "ttft_p90_s": 15.671339168207492,
+      "ttft_p99_s": 53.56683189840049,
+      "tpot_p50_s": 0.008854518407308914,
+      "tpot_p90_s": 0.02122720699121469,
+      "tpot_p99_s": 0.18280341184277568,
+      "e2e_p50_s": 2.754255389008904,
+      "e2e_p90_s": 24.8209177934099,
+      "e2e_p99_s": 80.59924928059091,
      "apc_ratio": 0.5694312382571595,
      "interference_index": 6.530231061794441,
-      "hotspot_index_ttft_p90": 2.237981740718548,
+      "hotspot_index_ttft_p90": 2.252837147833725,
      "reuse_intra_frac": 0.9321238805590836,
      "reuse_cross_frac": 0.05679481258506571,
      "n_slow": 295,
@@ -56,18 +56,18 @@
      "policy": "load_only",
      "n_ok": 1214,
      "n_total": 1214,
-      "ttft_p50_s": 1.2542553890380077,
-      "ttft_p90_s": 20.14692750602262,
-      "ttft_p99_s": 52.64810254302574,
-      "tpot_p50_s": 0.00923045912795929,
-      "tpot_p90_s": 0.02672785480314115,
-      "tpot_p99_s": 0.3207044094773148,
-      "e2e_p50_s": 3.584156609023921,
-      "e2e_p90_s": 33.42658680601744,
-      "e2e_p99_s": 93.91839688795153,
+      "ttft_p50_s": 1.2609447415161412,
+      "ttft_p90_s": 20.197147866390882,
+      "ttft_p99_s": 52.84285237012196,
+      "tpot_p50_s": 0.009231464695980247,
+      "tpot_p90_s": 0.026851662550158716,
+      "tpot_p99_s": 0.3211630676943426,
+      "e2e_p50_s": 3.58568156149704,
+      "e2e_p90_s": 33.459180271782685,
+      "e2e_p99_s": 93.95083751494239,
      "apc_ratio": 0.5412093853102866,
      "interference_index": 9.16424627504275,
-      "hotspot_index_ttft_p90": 1.1400531308102801,
+      "hotspot_index_ttft_p90": 1.2940319990630569,
      "reuse_intra_frac": 0.9353191550754928,
      "reuse_cross_frac": 0.053372184678592026,
      "n_slow": 379,
@@ -82,18 +82,18 @@
      "policy": "sticky",
      "n_ok": 1214,
      "n_total": 1214,
-      "ttft_p50_s": 0.540947844972834,
-      "ttft_p90_s": 18.016640832996927,
-      "ttft_p99_s": 71.37327494798228,
-      "tpot_p50_s": 0.00894752275507555,
-      "tpot_p90_s": 0.0360956137329512,
-      "tpot_p99_s": 0.34523129428917954,
-      "e2e_p50_s": 2.0788628259906545,
-      "e2e_p90_s": 34.605129147996195,
-      "e2e_p99_s": 133.5824547969969,
+      "ttft_p50_s": 0.5415176274836995,
+      "ttft_p90_s": 18.021296651283045,
+      "ttft_p99_s": 74.09429564891524,
+      "tpot_p50_s": 0.008952101894096181,
+      "tpot_p90_s": 0.03641285916619554,
+      "tpot_p99_s": 0.35152006935195085,
+      "e2e_p50_s": 2.081947358994512,
+      "e2e_p90_s": 34.62592205510591,
+      "e2e_p99_s": 139.68334607904353,
      "apc_ratio": 0.7720092868396378,
      "interference_index": 13.651718321568111,
-      "hotspot_index_ttft_p90": 2.3493858974059214,
+      "hotspot_index_ttft_p90": 2.727756623171119,
      "reuse_intra_frac": 0.9327723488279339,
      "reuse_cross_frac": 0.05495149683864246,
      "n_slow": 234,
@@ -109,17 +109,17 @@
      "n_ok": 1213,
      "n_total": 1214,
      "ttft_p50_s": 0.4997710260213353,
-      "ttft_p90_s": 7.239999514014926,
-      "ttft_p99_s": 42.022206099005416,
+      "ttft_p90_s": 7.345769894809922,
+      "ttft_p99_s": 42.34170345296613,
      "tpot_p50_s": 0.008079791456705824,
-      "tpot_p90_s": 0.017107906969874808,
-      "tpot_p99_s": 0.11808861252148231,
+      "tpot_p90_s": 0.017110194704198407,
+      "tpot_p99_s": 0.12655874612209597,
      "e2e_p50_s": 1.7495028690318577,
-      "e2e_p90_s": 17.893827292020433,
-      "e2e_p99_s": 68.18008507299237,
+      "e2e_p90_s": 18.033410895219994,
+      "e2e_p99_s": 68.80023987947489,
      "apc_ratio": 0.794261466256467,
      "interference_index": null,
-      "hotspot_index_ttft_p90": 3.3497107140827365,
+      "hotspot_index_ttft_p90": 3.667136528736114,
      "reuse_intra_frac": 0.9311187350942534,
      "reuse_cross_frac": 0.056702150437367635,
      "n_slow": 189,
--- a/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_hotspot.png
+++ b/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_hotspot.png
--- a/analysis/characterization/window_1_results/figures/fig_b3_latency_bars.png
+++ b/analysis/characterization/window_1_results/figures/fig_b3_latency_bars.png
--- a/analysis/characterization/window_1_results/figures/fig_b3_per_worker_ttft_p90.png
+++ b/analysis/characterization/window_1_results/figures/fig_b3_per_worker_ttft_p90.png
--- a/analysis/characterization/window_1_results/per_worker_capped.json
+++ b/analysis/characterization/window_1_results/per_worker_capped.json
@@ -1,5 +1,5 @@
 {
-  "hotspot_index_ttft_p90": 1.9366915542605314,
+  "hotspot_index_ttft_p90": 2.0204268015410918,
  "per_worker_latency_p90_s": {
    "http://127.0.0.1:8000": 23.81083881931848,
    "http://127.0.0.1:8001": 18.139674991380897,
@@ -21,4 +21,4 @@
    "http://127.0.0.1:8007": 9.661995008389932
  },
  "status": "supported"
-}
+}
--- a/analysis/characterization/window_1_results/per_worker_lmetric.json
+++ b/analysis/characterization/window_1_results/per_worker_lmetric.json
@@ -1,5 +1,5 @@
 {
-  "hotspot_index_ttft_p90": 2.237981740718548,
+  "hotspot_index_ttft_p90": 2.252837147833725,
  "per_worker_latency_p90_s": {
    "http://127.0.0.1:8000": 34.71445541951107,
    "http://127.0.0.1:8001": 21.922988962882666,
@@ -21,4 +21,4 @@
    "http://127.0.0.1:8007": 11.777357225219024
  },
  "status": "supported"
-}
+}
--- a/analysis/characterization/window_1_results/per_worker_load_only.json
+++ b/analysis/characterization/window_1_results/per_worker_load_only.json
@@ -1,5 +1,5 @@
 {
-  "hotspot_index_ttft_p90": 1.1400531308102801,
+  "hotspot_index_ttft_p90": 1.2940319990630569,
  "per_worker_latency_p90_s": {
    "http://127.0.0.1:8000": 33.51168999259829,
    "http://127.0.0.1:8001": 29.20308109278556,
@@ -21,4 +21,4 @@
    "http://127.0.0.1:8007": 13.95184187250561
  },
  "status": "supported"
-}
+}
--- a/analysis/characterization/window_1_results/per_worker_sticky.json
+++ b/analysis/characterization/window_1_results/per_worker_sticky.json
@@ -1,5 +1,5 @@
 {
-  "hotspot_index_ttft_p90": 2.3493858974059214,
+  "hotspot_index_ttft_p90": 2.727756623171119,
  "per_worker_latency_p90_s": {
    "http://127.0.0.1:8000": 30.185792533413043,
    "http://127.0.0.1:8001": 47.49661003401852,
@@ -21,4 +21,4 @@
    "http://127.0.0.1:8007": 2.4984901855932535
  },
  "status": "supported"
-}
+}
--- a/analysis/characterization/window_1_results/per_worker_unified.json
+++ b/analysis/characterization/window_1_results/per_worker_unified.json
@@ -1,5 +1,5 @@
 {
-  "hotspot_index_ttft_p90": 3.3497107140827365,
+  "hotspot_index_ttft_p90": 3.667136528736114,
  "per_worker_latency_p90_s": {
    "http://127.0.0.1:8000": 41.42001512600109,
    "http://127.0.0.1:8001": 12.4878579101933,
@@ -21,4 +21,4 @@
    "http://127.0.0.1:8007": 7.772977900883419
  },
  "status": "supported"
-}
+}