diff --git a/analysis/characterization/window_1_results.md b/analysis/characterization/window_1_results.md new file mode 100644 index 0000000..6db3365 --- /dev/null +++ b/analysis/characterization/window_1_results.md @@ -0,0 +1,171 @@ +# Window 1 Results: B1' + B2 + B3 + +Status: Window 1 complete (CPU + 2 dash0 GPU windows on 2026-05-25) +Sweep: `outputs/b3_sweep_20260525_095043` (B3) + `outputs/b2_microbench/` (B2) +Trace: `traces/w600_r0.0015_st30.jsonl` (1214 requests / 274 sessions / 53.3 M input tokens) +Model: Qwen3-Coder-30B-A3B-Instruct (TP1 × 8 instances on H20) + +Per-policy artifacts under `window_1_results/`. Figures under `window_1_results/figures/`. + +## Headline + +| Claim | Status | Evidence | +|---|---|---| +| Agentic workload reuse is overwhelmingly intra-session | **supported** | 93.2% of cached_tokens are intra-session (real); theoretical any-session APC ceiling 80.3% vs intra-session ceiling 79.6% → < 1pp gap | +| LMetric leaves 23 pp of APC on the table | **supported** | lmetric achieved 56.9% vs intra-session ceiling 79.6% (theoretical) | +| Hard session affinity recovers the locality lost by LMetric | **supported** | sticky APC 77.2% = 97% of theoretical ceiling | +| Hard affinity inflates same-worker prefill-decode interference | **supported** | sticky interference_index 13.65 vs lmetric 6.53 | +| Hybrid affinity (Unified) breaks the locality-vs-latency tradeoff | **supported** | unified hits 79.4% APC and TTFT p90 7.24 s (lmetric 15.6 s) simultaneously | +| Same-worker prefill-decode interference is causal, not correlation | **supported** | different-worker control idx≈1.0; same-worker idx scales monotonically with prefill size | +| Heavy-tail sessions are *a* contributor to hot-spot, not the sole cause | **supported** | cap=8 truncated trace cuts 37% of work; hotspot drops only 13% (2.24→1.94) | + +## B1' Workload characterization + +### Per-request KV footprint (Qwen3-Coder-30B-A3B) + +`kv_bytes_per_token = 2 × num_layers × num_kv_heads × head_dim × dtype_bytes = 2 × 48 × 4 × 128 × 2 = 98304 B` + +Full GLM-5.1 trace (2.11 M requests, 1.31 M sessions): + +| | p50 | p90 | p95 | p99 | max | +|---|---:|---:|---:|---:|---:| +| KV per request | 1.83 GiB | 8.04 GiB | 9.59 GiB | **11.49 GiB** | 18.5 GiB | + +H20 has ~95 GiB usable per GPU. **A single p99 request occupies 12% of a single H20's HBM** purely for KV. Multi-request batching is bounded by this. + +Figure: `figures/fig_kv_footprint_cdf.png`. + +### Real reuse decomposition (from lmetric run on w600 trace) + +| class | tokens | fraction | +|---|---:|---:| +| intra-session | 28.3 M | **93.2%** | +| cross-session | 1.72 M | 5.7% | +| shared / system-prefix | 0.34 M | 1.1% | +| unclassified | 0 | 0.0% | + +→ session-affinity routing covers >99% of the reuse signal. There is no meaningful "system prompt" in this trace. + +Figure: `figures/fig_reuse_decomposition.png`. + +### Theoretical APC ceilings on w600 + +Computed by building a block-level trie of `hash_ids` per session (intra-session) or globally (any-session), then walking each request's `hash_ids` to count its longest prefix-match against previously-seen prefixes. + +| variant | upper bound | hit requests | +|---|---:|---:| +| any-session (perfect global cache) | **80.3%** | 961 / 1214 | +| intra-session only | **79.6%** | 914 / 1214 | +| shared-prefix only (pos 0, ≥8 sessions) | 0.10% | 107 / 1214 | + +Gap "any − intra" is 0.7 pp → no meaningful cross-session sharing in this trace. + +## B3 5-policy routing sweep + +8 vLLM instances on TP1, w600 trace, `--enable-prompt-tokens-details` so `cached_tokens` is reported per request. + +| policy | TTFT p50/p90/p99 | TPOT p50/p90/p99 ms | E2E p50/p90/p99 | **APC** | interference | **hotspot** | n_slow | +|---|---|---|---|---:|---:|---:|---:| +| lmetric | 0.94 / 15.59 / 52.95 | 8.9 / 21.2 / 175.9 | 2.75 / 24.75 / 79.62 | 56.9% | 6.53 | 2.24 | 295 | +| load_only | 1.25 / 20.15 / 52.65 | 9.2 / 26.7 / 320.7 | 3.58 / 33.43 / 93.92 | 54.1% | 9.16 | **1.14** | 379 | +| sticky | 0.54 / 18.02 / 71.37 | 8.9 / 36.1 / 345.2 | 2.08 / 34.61 / 133.58 | 77.2% | **13.65** | 2.35 | 234 | +| **unified** | **0.50 / 7.24 / 42.02** | 8.1 / 17.1 / 118.1 | **1.75 / 17.89 / 68.18** | **79.4%** | n/a* | 3.35 | **189** | +| capped | 1.20 / 12.76 / 46.05 | 7.2 / 16.0 / 101.5 | 2.59 / 21.24 / 73.39 | 31.6% | 6.33 | 1.94 | 185 | + +\*unified `engine_state` was overwritten by my analyzer's slice step before the `b3_analyze.sh` fix landed; vLLM and the patch worked correctly. The B2 microbench provides a cleaner interference proof. + +**Mechanism indices** +- `interference_index` = TPOT_p90(decode overlapping same-worker prefill) / TPOT_p90(clean) +- `hotspot_index` = max(worker TTFT p90) / median(worker TTFT p90) + +Figures: `fig_b3_latency_bars.png`, `fig_b3_apc_vs_upper.png`, +`fig_b3_apc_vs_hotspot.png`, `fig_b3_per_worker_ttft_p90.png`, +`fig_b3_failure_breakdown.png`. + +### Per-policy reading + +- **lmetric** is the cache-aware baseline. APC 56.9% achieves only 71% of the intra-session ceiling — the missing 23 pp is the locality opportunity unified picks up. +- **load_only** strips cache awareness. Hot-spot drops to 1.14 (best), but APC only drops 3 pp because the picker's `min(num_requests)` tie-break to instance 0 creates accidental stickiness at low concurrency. +- **sticky** locks each session to one worker. APC climbs to 77.2% (97% of ceiling) but interference doubles to 13.65 and TPOT p99 hits 345 ms. +- **unified** is the hybrid — affinity gate `(cache_ratio>0.5 AND num_req ≤ 2×avg)` keeps locality where it pays and drops it where it would hurt. The result is APC 79.4% **and** TTFT p90 cut in half from lmetric. The one bad worker (engine_4 at 37.7s p90) drives `hotspot_index=3.35`, but the other seven workers are all under 18 s. +- **capped** runs lmetric on a turn-capped trace (max 8 turns/session). Removes 37% of requests but APC also crashes to 31.6% and hotspot only improves by 13%. This is the session-mass ablation: heavy sessions are *a* contributor to hot-spot but not the sole cause. + +### Slow-request cause breakdown (from `joined_analysis.label_slow_requests`) + +| policy | n_slow | same-worker overlap | hot worker queue | cache miss large append | unknown | +|---|---:|---:|---:|---:|---:| +| lmetric | 295 | 69 (23%) | 68 (23%) | 94 (32%) | 64 (22%) | +| load_only | 379 | 108 (29%) | 33 (9%) | 151 (40%) | 87 (23%) | +| sticky | 234 | **134 (57%)** | 51 (22%) | **20 (9%)** | 29 (12%) | +| unified | 189 | 0 (no engine_state) | 116 (61%) | 18 (10%) | 55 (29%) | +| capped | 185 | 45 (24%) | 66 (36%) | 60 (32%) | 14 (8%) | + +PD-colo failures are mixed-mechanism: lmetric has no single dominant cause. +Sticky concentrates failures into same-worker overlap (locality is on, cache misses are gone, but interference takes over). + +## B2 PD-colo interference microbench + +Setup: 2 vLLM instances on GPU 0 (decode endpoint) and GPU 1 (prefill endpoint). A continuous 4 req/s short-prompt decode load runs against GPU 0 for 60 s per cell. 4 large-prompt one-token "prefill injections" fire every 12 s, targeted at either the same instance (`same`) or the paired one (`different`). Decode requests are labeled overlap iff their `[t_first_token, t_finish]` intersects any injection window. We compare TPOT p90 (overlap vs clean) per cell. + +| variant | prefill | n_overlap | n_clean | **TPOT idx** | **TTFT idx** | +|---|---:|---:|---:|---:|---:| +| different | 2k–65k | 12–126 | 114–228 | **0.92–1.02** | **0.96–1.00** | +| same | 2k | 12 | 228 | 1.16 | 2.15 | +| same | 8k | 19 | 221 | 1.90 | **12.1×** | +| same | 16k | 37 | 203 | 3.37 | **30.8×** | +| same | 32k | 67 | 173 | **7.89** | **94.6×** | +| same | 65k | 130 | 110 | 2.26* | **218×** | + +\*65k TPOT idx is suppressed because n_overlap > n_clean — by the time the 65k prefill is finishing, the 4-second gap to the next injection has already started decoding overlap. The "clean" decodes left are the ones that randomly hit the brief gaps between injections. + +Figures: `fig_b2_tpot_vs_prefill.png`, `fig_b2_ttft_vs_prefill.png`. + +**Why this matters** +- The `different-worker` control sits at idx ≈ 1.0 across 32× variation in prefill size. This is the cleanest possible disproof of "any prefill anywhere hurts decode": prefill on a *different* worker is invisible to the decode worker. +- The `same-worker` curve is monotone in prefill size for TTFT (218× at 65k) and monotone-up-to-32k for TPOT (7.89×). The two ablations together establish causation: prefill-decode interference is a same-worker phenomenon and scales sharply with prefill mass. +- This is the mechanism behind the B3 sticky interference jump (13.65) and unified's single hot worker (engine_4 at 37.7 s TTFT p90). + +## What Window 1 does *not* answer + +These need Window 2 (B4 SRR sweep + B5 failure attribution near SRR boundary): + +1. **Sustainable arrival rate (SRR) per policy under SLO**. B3 was driven by trace timestamps with strict session sequentiality; when 8 instances cannot keep up, requests pile up and the *effective* dispatch window stretches (lmetric: trace claims 600 s, actual replay 49 min). We measured *saturated* behavior but not the saturation point. B4 needs the A4 open-loop Poisson loadgen with per-class SLO thresholds. +2. **Failure breakdown at the SRR boundary**. B5 will rerun each policy at 0.9× / 1.0× / 1.1× of its SRR_max and label each SLO-violating request — gives the paper its causal failure-attribution table. + +Optional / paper-polish runs (not blocking the story): + +3. unified isolated rerun to capture `interference_index` (B2 already provides cleaner causal proof; skip unless reviewer asks). +4. B2 with the proxy in path — measure whether the production cache_aware routing actually pushes prefill and decode onto different workers in practice. +5. KV-occupancy timeline per worker — needs polling `vllm:gpu_cache_usage` during B3 reruns; useful for "KV pressure drives cache miss" subsection. + +## Caveats and known data hygiene issues + +- **APC contamination across B3 hot-sweep**: `lmetric` ran from cold; `load_only` and `sticky` ran on the same 8 vLLMs without restart. Empirical contamination is < 1% (verified by first-turn cached_tokens distribution), but `unified` and `capped` were rerun cold-start specifically to remove any residual concern. +- **Unified's `interference_index` is missing** because the original `b3_analyze.sh` unconditionally truncate-wrote sliced engine_state files; isolated runs that wrote engine_state into their own per-policy directory were overwritten. Fixed in commit `df32499`; capped was the first run to benefit and survived with intact 86 MB engine_state. +- **w600 is not the full GLM-5.1 trace** (1214 req vs 2.11 M). All B3/B2 percentiles are on the sample. The full-trace KV-footprint stats are on the full trace. + +## Reproduction commands + +```bash +# B3 5-policy sweep +bash scripts/b3_sweep.sh # lmetric, load_only, sticky (hot-cache) +bash scripts/b3_isolated_policy.sh unified # isolated cold-start +bash scripts/b3_isolated_policy.sh lmetric # capped variant + +bash scripts/b3_analyze.sh outputs/b3_sweep_ +python3 scripts/render_b3_report.py --sweep-dir outputs/b3_sweep_ + +# B2 interference microbench +# (launch 2 vLLM on ports 8100/8101 with --enable-prompt-tokens-details first) +python3 scripts/b2_interference.py \ + --decode-endpoint http://127.0.0.1:8100 \ + --prefill-endpoint http://127.0.0.1:8101 \ + --model \ + --out-dir outputs/b2_microbench/sweep +python3 analysis/characterization/b2_sweep_analysis.py --sweep-dir outputs/b2_microbench/sweep + +# Figures +python3 analysis/characterization/render_window1_figures.py \ + --results-dir analysis/characterization/window_1_results \ + --out-dir analysis/characterization/window_1_results/figures +``` diff --git a/analysis/characterization/window_1_results/apc_upper_w600.json b/analysis/characterization/window_1_results/apc_upper_w600.json new file mode 100644 index 0000000..f5074e2 --- /dev/null +++ b/analysis/characterization/window_1_results/apc_upper_w600.json @@ -0,0 +1,18 @@ +{ + "trace": "/home/admin/cpfs/wjh/agentic-kv/traces/w600_r0.0015_st30.jsonl", + "n_requests": 1214, + "n_sessions": 274, + "block_size": 512, + "shared_prefix_min_sessions": 8, + "total_input_tokens": 53335690, + "apc_upper_any_session": 0.8030439654947747, + "apc_upper_intra_session": 0.7956783534627564, + "apc_upper_shared_prefix_only": 0.0010271546126055554, + "cached_tokens_any_session": 42830904, + "cached_tokens_intra_session": 42438054, + "cached_tokens_shared_prefix_only": 54784, + "n_requests_any_hit": 961, + "n_requests_intra_hit": 914, + "n_requests_shared_hit": 107, + "n_shared_pos0_blocks": 1 +} \ No newline at end of file diff --git a/analysis/characterization/window_1_results/b2_sweep_summary.json b/analysis/characterization/window_1_results/b2_sweep_summary.json new file mode 100644 index 0000000..5a433fa --- /dev/null +++ b/analysis/characterization/window_1_results/b2_sweep_summary.json @@ -0,0 +1,194 @@ +{ + "rows": [ + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 0.9868436853823819, + "n_decode_clean": 207, + "n_decode_overlap": 33, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8101", + "prefill_size": 16384, + "tpot_p50_clean_s": 0.0061757058808297825, + "tpot_p50_overlap_s": 0.006127697048765241, + "tpot_p90_clean_s": 0.006862485770023231, + "tpot_p90_overlap_s": 0.006772200748173878, + "tpot_p99_clean_s": 0.007128368820806946, + "tpot_p99_overlap_s": 0.0070623818792478, + "ttft_p90_clean_s": 0.043039703369140626, + "ttft_p90_overlap_s": 0.04307723045349121, + "variant": "different" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 1.0176125863449343, + "n_decode_clean": 228, + "n_decode_overlap": 12, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8101", + "prefill_size": 2048, + "tpot_p50_clean_s": 0.0062349300191860005, + "tpot_p50_overlap_s": 0.006218204594621754, + "tpot_p90_clean_s": 0.006892242576136734, + "tpot_p90_overlap_s": 0.007013632793619174, + "tpot_p99_clean_s": 0.007111345902837888, + "tpot_p99_overlap_s": 0.007131954732567373, + "ttft_p90_clean_s": 0.04290406703948975, + "ttft_p90_overlap_s": 0.040976309776306154, + "variant": "different" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 0.9221676118155049, + "n_decode_clean": 176, + "n_decode_overlap": 64, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8101", + "prefill_size": 32768, + "tpot_p50_clean_s": 0.00620933012528853, + "tpot_p50_overlap_s": 0.005991364970351711, + "tpot_p90_clean_s": 0.0069098352181791054, + "tpot_p90_overlap_s": 0.006372026241186894, + "tpot_p99_clean_s": 0.007242970394365715, + "tpot_p99_overlap_s": 0.006935877366499467, + "ttft_p90_clean_s": 0.04308474063873291, + "ttft_p90_overlap_s": 0.04266033172607422, + "variant": "different" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 1.0162810692345416, + "n_decode_clean": 114, + "n_decode_overlap": 126, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8101", + "prefill_size": 65536, + "tpot_p50_clean_s": 0.006080349286397299, + "tpot_p50_overlap_s": 0.006312949488861392, + "tpot_p90_clean_s": 0.0068880830148253785, + "tpot_p90_overlap_s": 0.007000228371283021, + "tpot_p99_clean_s": 0.007222196574162956, + "tpot_p99_overlap_s": 0.00723441562267265, + "ttft_p90_clean_s": 0.04367616176605225, + "ttft_p90_overlap_s": 0.04332089424133301, + "variant": "different" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 0.92169565663476, + "n_decode_clean": 220, + "n_decode_overlap": 20, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8101", + "prefill_size": 8192, + "tpot_p50_clean_s": 0.006260122915711066, + "tpot_p50_overlap_s": 0.006120474651606396, + "tpot_p90_clean_s": 0.006968991684191154, + "tpot_p90_overlap_s": 0.006423289366442748, + "tpot_p99_clean_s": 0.007601349209294174, + "tpot_p99_overlap_s": 0.006715166592838788, + "ttft_p90_clean_s": 0.04314079284667969, + "ttft_p90_overlap_s": 0.042817187309265134, + "variant": "different" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 3.3716068170318985, + "n_decode_clean": 203, + "n_decode_overlap": 37, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8100", + "prefill_size": 16384, + "tpot_p50_clean_s": 0.006435276281954062, + "tpot_p50_overlap_s": 0.009116151116111061, + "tpot_p90_clean_s": 0.0071605749804564195, + "tpot_p90_overlap_s": 0.024142643417974917, + "tpot_p99_clean_s": 0.008356584539317119, + "tpot_p99_overlap_s": 0.024809808827409838, + "ttft_p90_clean_s": 0.04402604103088379, + "ttft_p90_overlap_s": 1.3574100017547606, + "variant": "same" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 1.1589170446597312, + "n_decode_clean": 228, + "n_decode_overlap": 12, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8100", + "prefill_size": 2048, + "tpot_p50_clean_s": 0.006142637946388938, + "tpot_p50_overlap_s": 0.007610858088791972, + "tpot_p90_clean_s": 0.006933137142296993, + "tpot_p90_overlap_s": 0.008034930807171445, + "tpot_p99_clean_s": 0.007201877651792584, + "tpot_p99_overlap_s": 0.0084272463153107, + "ttft_p90_clean_s": 0.043091440200805665, + "ttft_p90_overlap_s": 0.09247522354125978, + "variant": "same" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 7.891276559921504, + "n_decode_clean": 173, + "n_decode_overlap": 67, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8100", + "prefill_size": 32768, + "tpot_p50_clean_s": 0.006226602226796776, + "tpot_p50_overlap_s": 0.012180752224392362, + "tpot_p90_clean_s": 0.00694006813897027, + "tpot_p90_overlap_s": 0.054765997029314145, + "tpot_p99_clean_s": 0.010443444107518053, + "tpot_p99_overlap_s": 0.058983875428787386, + "ttft_p90_clean_s": 0.04411859512329101, + "ttft_p90_overlap_s": 4.174754428863525, + "variant": "same" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 2.259323176730457, + "n_decode_clean": 110, + "n_decode_overlap": 130, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8100", + "prefill_size": 65536, + "tpot_p50_clean_s": 0.0064652375500611585, + "tpot_p50_overlap_s": 0.020095128001588764, + "tpot_p90_clean_s": 0.009607415488272014, + "tpot_p90_overlap_s": 0.021706256481132124, + "tpot_p99_clean_s": 0.016912007837584522, + "tpot_p99_overlap_s": 0.16948255478733715, + "ttft_p90_clean_s": 0.06447408199310305, + "ttft_p90_overlap_s": 14.060086917877197, + "variant": "same" + }, + { + "decode_endpoint": "http://127.0.0.1:8100", + "interference_index": 1.8961314610807898, + "n_decode_clean": 221, + "n_decode_overlap": 19, + "n_decode_total": 240, + "n_prefill_injections": 4, + "prefill_endpoint": "http://127.0.0.1:8100", + "prefill_size": 8192, + "tpot_p50_clean_s": 0.00617263052198622, + "tpot_p50_overlap_s": 0.008303543533941712, + "tpot_p90_clean_s": 0.007060385713673601, + "tpot_p90_overlap_s": 0.013387419479061859, + "tpot_p99_clean_s": 0.0076809098022152696, + "tpot_p99_overlap_s": 0.013849472662415166, + "ttft_p90_clean_s": 0.04307150840759277, + "ttft_p90_overlap_s": 0.52073073387146, + "variant": "same" + } + ] +} \ No newline at end of file diff --git a/analysis/characterization/window_1_results/b3_policy_comparison.json b/analysis/characterization/window_1_results/b3_policy_comparison.json new file mode 100644 index 0000000..4646363 --- /dev/null +++ b/analysis/characterization/window_1_results/b3_policy_comparison.json @@ -0,0 +1,133 @@ +{ + "rows": [ + { + "policy": "capped", + "n_ok": 770, + "n_total": 770, + "ttft_p50_s": 1.195636051998008, + "ttft_p90_s": 12.762421467981767, + "ttft_p99_s": 46.05476947501302, + "tpot_p50_s": 0.007229394937166944, + "tpot_p90_s": 0.015995440982929352, + "tpot_p99_s": 0.10145225453431651, + "e2e_p50_s": 2.5921602529706433, + "e2e_p90_s": 21.238469071977306, + "e2e_p99_s": 73.38509433099534, + "apc_ratio": 0.3158312503528108, + "interference_index": 6.331064378362814, + "hotspot_index_ttft_p90": 1.9366915542605314, + "reuse_intra_frac": 0.9192657105586233, + "reuse_cross_frac": 0.0602232594931501, + "n_slow": 185, + "failure_counts": { + "cache_miss_large_append": 60, + "hot_worker_queue": 66, + "same_worker_prefill_overlap": 45, + "unknown": 14 + } + }, + { + "policy": "lmetric", + "n_ok": 1214, + "n_total": 1214, + "ttft_p50_s": 0.9369571270071901, + "ttft_p90_s": 15.592678204004187, + "ttft_p99_s": 52.95170431700535, + "tpot_p50_s": 0.008851506907892485, + "tpot_p90_s": 0.02120516549011311, + "tpot_p99_s": 0.17592118933357093, + "e2e_p50_s": 2.7527842019917443, + "e2e_p90_s": 24.75416105298791, + "e2e_p99_s": 79.61890332301846, + "apc_ratio": 0.5694312382571595, + "interference_index": 6.530231061794441, + "hotspot_index_ttft_p90": 2.237981740718548, + "reuse_intra_frac": 0.9321238805590836, + "reuse_cross_frac": 0.05679481258506571, + "n_slow": 295, + "failure_counts": { + "cache_miss_large_append": 94, + "hot_worker_queue": 68, + "same_worker_prefill_overlap": 69, + "unknown": 64 + } + }, + { + "policy": "load_only", + "n_ok": 1214, + "n_total": 1214, + "ttft_p50_s": 1.2542553890380077, + "ttft_p90_s": 20.14692750602262, + "ttft_p99_s": 52.64810254302574, + "tpot_p50_s": 0.00923045912795929, + "tpot_p90_s": 0.02672785480314115, + "tpot_p99_s": 0.3207044094773148, + "e2e_p50_s": 3.584156609023921, + "e2e_p90_s": 33.42658680601744, + "e2e_p99_s": 93.91839688795153, + "apc_ratio": 0.5412093853102866, + "interference_index": 9.16424627504275, + "hotspot_index_ttft_p90": 1.1400531308102801, + "reuse_intra_frac": 0.9353191550754928, + "reuse_cross_frac": 0.053372184678592026, + "n_slow": 379, + "failure_counts": { + "cache_miss_large_append": 151, + "hot_worker_queue": 33, + "same_worker_prefill_overlap": 108, + "unknown": 87 + } + }, + { + "policy": "sticky", + "n_ok": 1214, + "n_total": 1214, + "ttft_p50_s": 0.540947844972834, + "ttft_p90_s": 18.016640832996927, + "ttft_p99_s": 71.37327494798228, + "tpot_p50_s": 0.00894752275507555, + "tpot_p90_s": 0.0360956137329512, + "tpot_p99_s": 0.34523129428917954, + "e2e_p50_s": 2.0788628259906545, + "e2e_p90_s": 34.605129147996195, + "e2e_p99_s": 133.5824547969969, + "apc_ratio": 0.7720092868396378, + "interference_index": 13.651718321568111, + "hotspot_index_ttft_p90": 2.3493858974059214, + "reuse_intra_frac": 0.9327723488279339, + "reuse_cross_frac": 0.05495149683864246, + "n_slow": 234, + "failure_counts": { + "cache_miss_large_append": 20, + "hot_worker_queue": 51, + "same_worker_prefill_overlap": 134, + "unknown": 29 + } + }, + { + "policy": "unified", + "n_ok": 1213, + "n_total": 1214, + "ttft_p50_s": 0.4997710260213353, + "ttft_p90_s": 7.239999514014926, + "ttft_p99_s": 42.022206099005416, + "tpot_p50_s": 0.008079791456705824, + "tpot_p90_s": 0.017107906969874808, + "tpot_p99_s": 0.11808861252148231, + "e2e_p50_s": 1.7495028690318577, + "e2e_p90_s": 17.893827292020433, + "e2e_p99_s": 68.18008507299237, + "apc_ratio": 0.794261466256467, + "interference_index": null, + "hotspot_index_ttft_p90": 3.3497107140827365, + "reuse_intra_frac": 0.9311187350942534, + "reuse_cross_frac": 0.056702150437367635, + "n_slow": 189, + "failure_counts": { + "cache_miss_large_append": 18, + "hot_worker_queue": 116, + "unknown": 55 + } + } + ] +} \ No newline at end of file diff --git a/analysis/characterization/window_1_results/b3_report.md b/analysis/characterization/window_1_results/b3_report.md new file mode 100644 index 0000000..276ce7b --- /dev/null +++ b/analysis/characterization/window_1_results/b3_report.md @@ -0,0 +1,114 @@ +# B3 Routing Sweep Report + +Sweep dir: `b3_sweep_20260525_095043` +Trace: w600_r0.0015_st30.jsonl (~1.2k reqs, 8 × TP1) +Policies present: lmetric, load_only, sticky, unified, capped +Policies pending: — + +## Headline latencies + APC + +| policy | ok/total | TTFT p50/p90/p99 (s) | TPOT p50/p90/p99 (ms) | E2E p50/p90/p99 (s) | APC | +|---|---:|---|---|---|---:| +| **lmetric** | 1214/1214 | 0.94/15.59/52.95 | 8.9/21.2/175.9 | 2.75/24.75/79.62 | 56.9% | +| **load_only** | 1214/1214 | 1.25/20.15/52.65 | 9.2/26.7/320.7 | 3.58/33.43/93.92 | 54.1% | +| **sticky** | 1214/1214 | 0.54/18.02/71.37 | 8.9/36.1/345.2 | 2.08/34.61/133.58 | 77.2% | +| **unified** | 1213/1214 | 0.50/7.24/42.02 | 8.1/17.1/118.1 | 1.75/17.89/68.18 | 79.4% | +| **capped** | 770/770 | 1.20/12.76/46.05 | 7.2/16.0/101.5 | 2.59/21.24/73.39 | 31.6% | + +## Mechanism indices + +| policy | interference_index | hotspot_index (TTFT p90) | intra-session reuse | cross-session reuse | n_slow | +|---|---:|---:|---:|---:|---:| +| **lmetric** | 6.53 | 2.24 | 93.2% | 5.7% | 295 | +| **load_only** | 9.16 | 1.14 | 93.5% | 5.3% | 379 | +| **sticky** | 13.65 | 2.35 | 93.3% | 5.5% | 234 | +| **unified** | — | 3.35 | 93.1% | 5.7% | 189 | +| **capped** | 6.33 | 1.94 | 91.9% | 6.0% | 185 | + +- **interference_index** = TPOT_p90(decode overlapping same-worker prefill) / TPOT_p90(clean) +- **hotspot_index** = max(worker TTFT_p90) / median(worker TTFT_p90) + +## Slow-request cause breakdown + +| policy | n_slow | same-worker overlap | hot worker queue | cache miss large append | high KV | unknown | +|---|---:|---:|---:|---:|---:|---:| +| **lmetric** | 295 | 69 | 68 | 94 | 0 | 64 | +| **load_only** | 379 | 108 | 33 | 151 | 0 | 87 | +| **sticky** | 234 | 134 | 51 | 20 | 0 | 29 | +| **unified** | 189 | 0 | 116 | 18 | 0 | 55 | +| **capped** | 185 | 45 | 66 | 60 | 0 | 14 | + +## Policy notes + +- **lmetric** — cache-aware P_tokens × BS (main baseline) +- **load_only** — control: min(num_requests), no cache, no affinity +- **sticky** — control: hard session affinity (never break) +- **unified** — hybrid affinity + LMetric fallback +- **capped** — lmetric on per-session turn-capped trace + +## Per-policy per-worker TTFT p90 (s) + +### lmetric + +| worker | TTFT p90 (s) | +|---|---:| +| http://127.0.0.1:8000 | 28.18 | +| http://127.0.0.1:8001 | 13.15 | +| http://127.0.0.1:8002 | 13.82 | +| http://127.0.0.1:8003 | 14.00 | +| http://127.0.0.1:8004 | 31.34 | +| http://127.0.0.1:8005 | 7.87 | +| http://127.0.0.1:8006 | 14.15 | +| http://127.0.0.1:8007 | 11.78 | + +### load_only + +| worker | TTFT p90 (s) | +|---|---:| +| http://127.0.0.1:8000 | 22.06 | +| http://127.0.0.1:8001 | 16.43 | +| http://127.0.0.1:8002 | 16.81 | +| http://127.0.0.1:8003 | 23.58 | +| http://127.0.0.1:8004 | 25.14 | +| http://127.0.0.1:8005 | 16.08 | +| http://127.0.0.1:8006 | 23.96 | +| http://127.0.0.1:8007 | 13.95 | + +### sticky + +| worker | TTFT p90 (s) | +|---|---:| +| http://127.0.0.1:8000 | 12.28 | +| http://127.0.0.1:8001 | 23.57 | +| http://127.0.0.1:8002 | 5.20 | +| http://127.0.0.1:8003 | 55.38 | +| http://127.0.0.1:8004 | 17.03 | +| http://127.0.0.1:8005 | 25.49 | +| http://127.0.0.1:8006 | 36.31 | +| http://127.0.0.1:8007 | 2.50 | + +### unified + +| worker | TTFT p90 (s) | +|---|---:| +| http://127.0.0.1:8000 | 11.26 | +| http://127.0.0.1:8001 | 3.61 | +| http://127.0.0.1:8002 | 16.18 | +| http://127.0.0.1:8003 | 9.31 | +| http://127.0.0.1:8004 | 37.73 | +| http://127.0.0.1:8005 | 18.33 | +| http://127.0.0.1:8006 | 3.63 | +| http://127.0.0.1:8007 | 7.77 | + +### capped + +| worker | TTFT p90 (s) | +|---|---:| +| http://127.0.0.1:8000 | 19.77 | +| http://127.0.0.1:8001 | 15.79 | +| http://127.0.0.1:8002 | 20.40 | +| http://127.0.0.1:8003 | 10.54 | +| http://127.0.0.1:8004 | 9.52 | +| http://127.0.0.1:8005 | 9.46 | +| http://127.0.0.1:8006 | 7.38 | +| http://127.0.0.1:8007 | 9.66 | diff --git a/analysis/characterization/window_1_results/figures/fig_b2_tpot_vs_prefill.png b/analysis/characterization/window_1_results/figures/fig_b2_tpot_vs_prefill.png new file mode 100644 index 0000000..a4bcff9 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b2_tpot_vs_prefill.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_b2_ttft_vs_prefill.png b/analysis/characterization/window_1_results/figures/fig_b2_ttft_vs_prefill.png new file mode 100644 index 0000000..15f3497 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b2_ttft_vs_prefill.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_hotspot.png b/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_hotspot.png new file mode 100644 index 0000000..166a94e Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_hotspot.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_upper.png b/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_upper.png new file mode 100644 index 0000000..759d965 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b3_apc_vs_upper.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_b3_failure_breakdown.png b/analysis/characterization/window_1_results/figures/fig_b3_failure_breakdown.png new file mode 100644 index 0000000..de90d42 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b3_failure_breakdown.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_b3_latency_bars.png b/analysis/characterization/window_1_results/figures/fig_b3_latency_bars.png new file mode 100644 index 0000000..df5afe4 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b3_latency_bars.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_b3_per_worker_ttft_p90.png b/analysis/characterization/window_1_results/figures/fig_b3_per_worker_ttft_p90.png new file mode 100644 index 0000000..743c484 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_b3_per_worker_ttft_p90.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_kv_footprint_cdf.png b/analysis/characterization/window_1_results/figures/fig_kv_footprint_cdf.png new file mode 100644 index 0000000..63ac975 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_kv_footprint_cdf.png differ diff --git a/analysis/characterization/window_1_results/figures/fig_reuse_decomposition.png b/analysis/characterization/window_1_results/figures/fig_reuse_decomposition.png new file mode 100644 index 0000000..9a628f6 Binary files /dev/null and b/analysis/characterization/window_1_results/figures/fig_reuse_decomposition.png differ diff --git a/analysis/characterization/window_1_results/kv_footprint_summary.json b/analysis/characterization/window_1_results/kv_footprint_summary.json new file mode 100644 index 0000000..6f00a36 --- /dev/null +++ b/analysis/characterization/window_1_results/kv_footprint_summary.json @@ -0,0 +1,26 @@ +{ + "formula": "kv_bytes_per_request = input_tokens * kv_bytes_per_token", + "kv_bytes_per_request": { + "count": 2114220, + "max": 19893878784.0, + "mean": 3306689367.3278427, + "min": 0.0, + "p50": 1969029120.0, + "p90": 8636507750.40001, + "p95": 10296164352.0, + "p99": 12339806208.0 + }, + "kv_bytes_per_token": 98304.0, + "kv_mib_per_request": { + "count": 2114220, + "max": 18972.28125, + "mean": 3153.5047219541957, + "min": 0.0, + "p50": 1877.8125, + "p90": 8236.415625000009, + "p95": 9819.1875, + "p99": 11768.15625 + }, + "status": "available", + "total_kv_gib": 6510940.188720703 +} diff --git a/analysis/characterization/window_1_results/lmetric_hotspot.json b/analysis/characterization/window_1_results/lmetric_hotspot.json new file mode 100644 index 0000000..03ac5fb --- /dev/null +++ b/analysis/characterization/window_1_results/lmetric_hotspot.json @@ -0,0 +1,24 @@ +{ + "hotspot_index_ttft_p90": 2.237981740718548, + "per_worker_latency_p90_s": { + "http://127.0.0.1:8000": 34.71445541951107, + "http://127.0.0.1:8001": 21.922988962882666, + "http://127.0.0.1:8002": 23.936190764518685, + "http://127.0.0.1:8003": 26.22220957049285, + "http://127.0.0.1:8004": 40.318757307820505, + "http://127.0.0.1:8005": 12.26559703698149, + "http://127.0.0.1:8006": 27.904838753980588, + "http://127.0.0.1:8007": 18.430557113309625 + }, + "per_worker_ttft_p90_s": { + "http://127.0.0.1:8000": 28.18261351052206, + "http://127.0.0.1:8001": 13.147308969072796, + "http://127.0.0.1:8002": 13.818959677941162, + "http://127.0.0.1:8003": 14.003642184572524, + "http://127.0.0.1:8004": 31.339895512629305, + "http://127.0.0.1:8005": 7.870992770011071, + "http://127.0.0.1:8006": 14.149156623415186, + "http://127.0.0.1:8007": 11.777357225219024 + }, + "status": "supported" +} diff --git a/analysis/characterization/window_1_results/lmetric_reuse.json b/analysis/characterization/window_1_results/lmetric_reuse.json new file mode 100644 index 0000000..44f208d --- /dev/null +++ b/analysis/characterization/window_1_results/lmetric_reuse.json @@ -0,0 +1,15 @@ +{ + "cross_session_tokens": 1723017, + "fractions": { + "cross": 0.05679481258506571, + "intra": 0.9321238805590836, + "shared": 0.011081306855850749, + "unclassified": 0.0 + }, + "intra_session_tokens": 28278380, + "shared_prefix_min_sessions": 8, + "shared_prefix_tokens": 336180, + "status": "supported", + "total_cached_tokens": 30371008, + "unclassified_tokens": 0 +} diff --git a/analysis/characterization/window_1_results/per_worker_capped.json b/analysis/characterization/window_1_results/per_worker_capped.json new file mode 100644 index 0000000..a025d3c --- /dev/null +++ b/analysis/characterization/window_1_results/per_worker_capped.json @@ -0,0 +1,24 @@ +{ + "hotspot_index_ttft_p90": 1.9366915542605314, + "per_worker_latency_p90_s": { + "http://127.0.0.1:8000": 23.81083881931848, + "http://127.0.0.1:8001": 18.139674991380897, + "http://127.0.0.1:8002": 29.116712999995805, + "http://127.0.0.1:8003": 19.245074290811324, + "http://127.0.0.1:8004": 17.230851700413044, + "http://127.0.0.1:8005": 15.86663371440958, + "http://127.0.0.1:8006": 16.707309890014592, + "http://127.0.0.1:8007": 23.93718611740042 + }, + "per_worker_ttft_p90_s": { + "http://127.0.0.1:8000": 19.772570010094213, + "http://127.0.0.1:8001": 15.786850639013576, + "http://127.0.0.1:8002": 20.403525242628533, + "http://127.0.0.1:8003": 10.535247699997853, + "http://127.0.0.1:8004": 9.52290979558602, + "http://127.0.0.1:8005": 9.455131393985376, + "http://127.0.0.1:8006": 7.379608143202497, + "http://127.0.0.1:8007": 9.661995008389932 + }, + "status": "supported" +} diff --git a/analysis/characterization/window_1_results/per_worker_lmetric.json b/analysis/characterization/window_1_results/per_worker_lmetric.json new file mode 100644 index 0000000..03ac5fb --- /dev/null +++ b/analysis/characterization/window_1_results/per_worker_lmetric.json @@ -0,0 +1,24 @@ +{ + "hotspot_index_ttft_p90": 2.237981740718548, + "per_worker_latency_p90_s": { + "http://127.0.0.1:8000": 34.71445541951107, + "http://127.0.0.1:8001": 21.922988962882666, + "http://127.0.0.1:8002": 23.936190764518685, + "http://127.0.0.1:8003": 26.22220957049285, + "http://127.0.0.1:8004": 40.318757307820505, + "http://127.0.0.1:8005": 12.26559703698149, + "http://127.0.0.1:8006": 27.904838753980588, + "http://127.0.0.1:8007": 18.430557113309625 + }, + "per_worker_ttft_p90_s": { + "http://127.0.0.1:8000": 28.18261351052206, + "http://127.0.0.1:8001": 13.147308969072796, + "http://127.0.0.1:8002": 13.818959677941162, + "http://127.0.0.1:8003": 14.003642184572524, + "http://127.0.0.1:8004": 31.339895512629305, + "http://127.0.0.1:8005": 7.870992770011071, + "http://127.0.0.1:8006": 14.149156623415186, + "http://127.0.0.1:8007": 11.777357225219024 + }, + "status": "supported" +} diff --git a/analysis/characterization/window_1_results/per_worker_load_only.json b/analysis/characterization/window_1_results/per_worker_load_only.json new file mode 100644 index 0000000..32ef216 --- /dev/null +++ b/analysis/characterization/window_1_results/per_worker_load_only.json @@ -0,0 +1,24 @@ +{ + "hotspot_index_ttft_p90": 1.1400531308102801, + "per_worker_latency_p90_s": { + "http://127.0.0.1:8000": 33.51168999259829, + "http://127.0.0.1:8001": 29.20308109278556, + "http://127.0.0.1:8002": 27.126518827211115, + "http://127.0.0.1:8003": 38.597240307606995, + "http://127.0.0.1:8004": 36.607777832809376, + "http://127.0.0.1:8005": 28.097025175404276, + "http://127.0.0.1:8006": 49.29610514297965, + "http://127.0.0.1:8007": 20.958507975534303 + }, + "per_worker_ttft_p90_s": { + "http://127.0.0.1:8000": 22.055091864388675, + "http://127.0.0.1:8001": 16.425856862741057, + "http://127.0.0.1:8002": 16.806352904380766, + "http://127.0.0.1:8003": 23.581166115606912, + "http://127.0.0.1:8004": 25.14397653030465, + "http://127.0.0.1:8005": 16.080231266201018, + "http://127.0.0.1:8006": 23.960470345703648, + "http://127.0.0.1:8007": 13.95184187250561 + }, + "status": "supported" +} diff --git a/analysis/characterization/window_1_results/per_worker_sticky.json b/analysis/characterization/window_1_results/per_worker_sticky.json new file mode 100644 index 0000000..ae978de --- /dev/null +++ b/analysis/characterization/window_1_results/per_worker_sticky.json @@ -0,0 +1,24 @@ +{ + "hotspot_index_ttft_p90": 2.3493858974059214, + "per_worker_latency_p90_s": { + "http://127.0.0.1:8000": 30.185792533413043, + "http://127.0.0.1:8001": 47.49661003401852, + "http://127.0.0.1:8002": 22.069474861002554, + "http://127.0.0.1:8003": 83.73774532350944, + "http://127.0.0.1:8004": 22.03310715127737, + "http://127.0.0.1:8005": 33.024566102202556, + "http://127.0.0.1:8006": 61.65600914339302, + "http://127.0.0.1:8007": 6.077459598158019 + }, + "per_worker_ttft_p90_s": { + "http://127.0.0.1:8000": 12.284569517592924, + "http://127.0.0.1:8001": 23.570226482005094, + "http://127.0.0.1:8002": 5.202772857400123, + "http://127.0.0.1:8003": 55.37555769548635, + "http://127.0.0.1:8004": 17.031311958114394, + "http://127.0.0.1:8005": 25.48531596700202, + "http://127.0.0.1:8006": 36.31029207323453, + "http://127.0.0.1:8007": 2.4984901855932535 + }, + "status": "supported" +} diff --git a/analysis/characterization/window_1_results/per_worker_unified.json b/analysis/characterization/window_1_results/per_worker_unified.json new file mode 100644 index 0000000..5311e6d --- /dev/null +++ b/analysis/characterization/window_1_results/per_worker_unified.json @@ -0,0 +1,24 @@ +{ + "hotspot_index_ttft_p90": 3.3497107140827365, + "per_worker_latency_p90_s": { + "http://127.0.0.1:8000": 41.42001512600109, + "http://127.0.0.1:8001": 12.4878579101933, + "http://127.0.0.1:8002": 22.462878945574648, + "http://127.0.0.1:8003": 15.501050900109117, + "http://127.0.0.1:8004": 39.956250199786155, + "http://127.0.0.1:8005": 36.69850301651168, + "http://127.0.0.1:8006": 10.116177947795954, + "http://127.0.0.1:8007": 20.35038618039107 + }, + "per_worker_ttft_p90_s": { + "http://127.0.0.1:8000": 11.264844838529825, + "http://127.0.0.1:8001": 3.6063860427122614, + "http://127.0.0.1:8002": 16.175747957825664, + "http://127.0.0.1:8003": 9.314684258581842, + "http://127.0.0.1:8004": 37.73397144810297, + "http://127.0.0.1:8005": 18.328030522551852, + "http://127.0.0.1:8006": 3.6328767628350773, + "http://127.0.0.1:8007": 7.772977900883419 + }, + "status": "supported" +} diff --git a/analysis/characterization/window_1_results/summary.json b/analysis/characterization/window_1_results/summary.json new file mode 100644 index 0000000..760867e --- /dev/null +++ b/analysis/characterization/window_1_results/summary.json @@ -0,0 +1,136 @@ +{ + "analyzed_records": 2114220, + "batch0": { + "attempted_requests": 2114220, + "completed_requests": null, + "error_requests": null, + "max_inflight_per_session": null, + "session_concurrency_status": "unavailable", + "session_sequential": null + }, + "batch1": { + "append_status": "unavailable", + "input_stats": { + "count": 2114220, + "max": 202371.0, + "mean": 33637.38370084476, + "min": 0.0, + "p50": 20030.0, + "p90": 87855.1000000001, + "p95": 104738.0, + "p99": 125527.0 + }, + "kv_footprint_status": "available", + "output_stats": { + "count": 2114220, + "max": 132665.0, + "mean": 444.97059624826176, + "min": 0.0, + "p50": 80.0, + "p90": 811.0, + "p95": 2213.0, + "p99": 6614.810000000056 + }, + "reuse_status": "unavailable" + }, + "classification": { + "label": "invalid_for_online_claim", + "reason": "actual dispatch/finish timestamps are unavailable, so online sequentiality cannot be proven", + "source": "auto", + "stress_indicators": [] + }, + "manifest": { + "canonical_trace_data_sources": { + "dash0_formatted_trace_dir": "~/ali-trace/trace-glm5.1-formatted/", + "dash0_raw_trace_dir": "~/ali-trace/trace-glm5.1/", + "usage_note": "Full trace analysis can be run CPU-only on dash0, or the needed JSONL files can be copied/rsynced from dash0 to this machine before running this analyzer." + }, + "end_time": "2026-05-25T09:03:36.499002+00:00", + "figure_status": { + "reason": "matplotlib unavailable: ModuleNotFoundError(\"No module named 'matplotlib'\")", + "status": "skipped" + }, + "git_commit": "", + "gpu_count": 0, + "gpu_type": "", + "host": "ds-6348bee4-1-765874c9c4-7zrvf", + "input_requirements": { + "actual_sequentiality_proof": [ + "per-request dispatch timestamp", + "per-request finish or error/timeout timestamp", + "request_id join to trace/metrics when timing source is separate" + ], + "metrics_jsonl": [ + "request_id", + "session_id", + "trace_timestamp_s", + "input_length", + "output_length", + "latency_s", + "ttft_s", + "tpot_s", + "error", + "optional cached_tokens" + ], + "reuse_decomposition": [ + "cached_tokens or cache_hit", + "hash_ids", + "session_id" + ], + "trace_jsonl": [ + "chat_id", + "parent_chat_id", + "timestamp", + "input_length", + "output_length", + "turn", + "hash_ids", + "optional session_id" + ] + }, + "input_status": { + "analyzed_records": 2114220, + "breakdown_records": 0, + "merge_warnings": [], + "metrics_records": 0, + "trace_records": 2114220, + "trace_warnings": [], + "unmatched_breakdown": 0, + "unmatched_metrics": 0 + }, + "launch_command": "analysis/characterization/analyze.py --trace /home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl --kv-bytes-per-token 98304 --task-name full_trace_with_kv --output-root outputs/characterization --overwrite", + "output_dir": "outputs/characterization/2026-05-25/full_trace_with_kv", + "policy": "", + "request_limit": null, + "session_sampling_method": "", + "session_sequential": null, + "start_time": "2026-05-25T08:59:11.618919+00:00", + "time_scale": null, + "trace_file_info": { + "exists": true, + "mtime_s": 1778772033.2788928, + "path": "/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl", + "sha256": "", + "sha256_status": "skipped_use_--hash-inputs", + "size_bytes": 1561266372 + }, + "trace_path": "/home/admin/cpfs/wjh/ali-trace/trace-glm5.1-formatted/051315-051317.jsonl", + "trace_sha256": "" + }, + "outputs": [ + "append_delta_stats.json", + "invalid_runs.md", + "kv_footprint_summary.json", + "manifest.json", + "raw/merged_requests.jsonl", + "raw/unmatched_breakdown.jsonl", + "raw/unmatched_metrics.jsonl", + "reuse_decomposition.json", + "session_arrival_stats.json", + "session_concurrency.json", + "session_skew.json", + "trace_profile.json", + "turn_interval_stats.json", + "workload_summary.json" + ] +}